We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 798 entries: 1-798 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 11 May 21

[1]  arXiv:2105.03432 [pdf, other]
Title: Generalising Multilingual Concept-to-Text NLG with Language Agnostic Delexicalisation
Comments: To be published in the proceedings of ACL-IJCNLP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Concept-to-text Natural Language Generation is the task of expressing an input meaning representation in natural language. Previous approaches in this task have been able to generalise to rare or unseen instances by relying on a delexicalisation of the input. However, this often requires that the input appears verbatim in the output text. This poses challenges in multilingual settings, where the task expands to generate the output text in multiple languages given the same input. In this paper, we explore the application of multilingual models in concept-to-text and propose Language Agnostic Delexicalisation, a novel delexicalisation method that uses multilingual pretrained embeddings, and employs a character-level post-editing model to inflect words in their correct form during relexicalisation. Our experiments across five datasets and five languages show that multilingual models outperform monolingual models in concept-to-text and that our framework outperforms previous approaches, especially for low resource languages.

[2]  arXiv:2105.03452 [pdf, other]
Title: New Numerical Interface Scheme for the Kurganov-Tadmor second-order Method
Subjects: Numerical Analysis (math.NA)

In this paper, we develop a numerical scheme to handle interfaces across computational domains in multi-block schemes for the approximation of systems of conservation laws. We are interested in transmitting shock discontinuities without lowering the overall precision of the method. We want to accomplish this without using information from interior points of adjacent grids, that is, sharing only information from boundary points of those grids. To achieve this, we choose to work with the second-order Kurganov-Tadmor (KT) method at interior points, relaxing it to first order at interfaces. This allows us to keep second-order overall accuracy (in the relevant norm) and at the same time preserve the TVD property of the original scheme. After developing the method we performed several standard one and two-dimensional tests. Among them, we used the one-dimensional advection and Burgers equations to verify the second-order convergence of the method. We also tested the two-dimensional Euler equations with an implosion and a Gresho vortex\cite{liska2003}. In particular, in the two-dimensional implosion test we can see that regardless of the orientation of shocks with respect to the interface, they travel across them without appreciable deformation both in amplitude and front direction.

[3]  arXiv:2105.03456 [pdf, other]
Title: CASTing a Net: Supporting Teachers with Search Technology
Comments: KidRec '21: 5th International and Interdisciplinary Perspectives on Children & Recommender and Information Retrieval Systems (KidRec) Search and Recommendation Technology through the Lens of a Teacher- Co-located with ACM IDC 2021
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

Past and current research has typically focused on ensuring that search technology for the classroom serves children. In this paper, we argue for the need to broaden the research focus to include teachers and how search technology can aid them. In particular, we share how furnishing a behind-the-scenes portal for teachers can empower them by providing a window into the spelling, writing, and concept connection skills of their students.

[4]  arXiv:2105.03458 [pdf, other]
Title: Duplex Sequence-to-Sequence Learning for Reversible Machine Translation
Comments: Under review, 10 pages
Subjects: Computation and Language (cs.CL)

Sequence-to-sequence (seq2seq) problems such as machine translation are bidirectional, which naturally derive a pair of directional tasks and two directional learning signals. However, typical seq2seq neural networks are {\em simplex} that only model one unidirectional task, which cannot fully exploit the potential of bidirectional learning signals from parallel data. To address this issue, we propose a {\em duplex} seq2seq neural network, REDER (Reversible Duplex Transformer), and apply it to machine translation. The architecture of REDER has two ends, each of which specializes in a language so as to read and yield sequences in that language. As a result, REDER can simultaneously learn from the bidirectional signals, and enables {\em reversible machine translation} by simply flipping the input and output ends, Experiments on widely-used machine translation benchmarks verify that REDER achieves the first success of reversible machine translation, which helps obtain considerable gains over several strong baselines.

[5]  arXiv:2105.03461 [pdf]
Title: Impact of DER Communication Delay in AGC: Cyber-Physical Dynamic Simulation
Subjects: Systems and Control (eess.SY)

Distributed energy resource (DER) frequency regulations are promising technologies for future grid operation. Unlike conventional generators, DERs might require open communication networks to exchange signals with control centers, possibly through DER aggregators; therefore, the impacts of the communication variations on the system stability need to be investigated. This paper develops a cyber-physical dynamic simulation model based on the Hierarchical Engine for Large-Scale Co-Simulation (HELICS) to evaluate the impact of the communication variations, such as delays in DER frequency regulations. The feasible delay range can be obtained under different parameter settings. The results show that the risk of instability generally increases with the communication delay.

[6]  arXiv:2105.03462 [pdf, ps, other]
Title: Necessary and Sufficient Girth Conditions for Tanner Graphs of Quasi-Cyclic LDPC Codes
Comments: Submitted to the 2021 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT)

This paper revisits the connection between the girth of a protograph-based LDPC code given by a parity-check matrix and the properties of powers of the product between the matrix and its transpose in order to obtain the necessary and sufficient conditions for a code to have given girth between 6 and 12, and to show how these conditions can be incorporated into simple algorithms to construct codes of that girth. To this end, we highlight the role that certain submatrices that appear in these products have in the construction of codes of desired girth. In particular, we show that imposing girth conditions on a parity-check matrix is equivalent to imposing conditions on a square submatrix obtained from it and we show how this equivalence is particularly strong for a protograph based parity-check matrix of variable node degree 2, where the cycles in its Tanner graph correspond one-to-one to the cycles in the Tanner graph of a square submatrix obtained by adding the permutation matrices (or products of these) in the composition of the parity-check matrix. We end the paper with exemplary constructions of codes with various girths and computer simulations. Although, we mostly assume the case of fully connected protographs of variable node degree 2 and 3, the results can be used for any parity-check matrix/protograph-based Tanner graph.

[7]  arXiv:2105.03463 [pdf, other]
Title: Conditional a posteriori error bounds for high order DG time stepping approximations of semilinear heat models with blow-up
Subjects: Numerical Analysis (math.NA)

This work is concerned with the development of an adaptive numerical method for semilinear heat flow models featuring a general (possibly) nonlinear reaction term that may cause the solution to blow up in finite time. The fully discrete scheme consists of a high order discontinuous Galerkin (dG) time stepping method and a conforming finite element discretisation (cG) in space. The proposed adaptive procedure is based on rigorously devised conditional a posteriori error bounds in the $L^{\infty}(L^{\infty})$ norm. Numerical experiments complement the theoretical results.

[8]  arXiv:2105.03464 [pdf]
Title: Estimating Parkinsonism Severity in Natural Gait Videos of Older Adults with Dementia
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Drug-induced parkinsonism affects many older adults with dementia, often causing gait disturbances. New advances in vision-based human pose-estimation have opened possibilities for frequent and unobtrusive analysis of gait in residential settings. This work proposes novel spatial-temporal graph convolutional network (ST-GCN) architectures and training procedures to predict clinical scores of parkinsonism in gait from video of individuals with dementia. We propose a two-stage training approach consisting of a self-supervised pretraining stage that encourages the ST-GCN model to learn about gait patterns before predicting clinical scores in the finetuning stage. The proposed ST-GCN models are evaluated on joint trajectories extracted from video and are compared against traditional (ordinal, linear, random forest) regression models and temporal convolutional network baselines. Three 2D human pose-estimation libraries (OpenPose, Detectron, AlphaPose) and the Microsoft Kinect (2D and 3D) are used to extract joint trajectories of 4787 natural walking bouts from 53 older adults with dementia. A subset of 399 walks from 14 participants is annotated with scores of parkinsonism severity on the gait criteria of the Unified Parkinson's Disease Rating Scale (UPDRS) and the Simpson-Angus Scale (SAS). Our results demonstrate that ST-GCN models operating on 3D joint trajectories extracted from the Kinect consistently outperform all other models and feature sets. Prediction of parkinsonism scores in natural walking bouts of unseen participants remains a challenging task, with the best models achieving macro-averaged F1-scores of 0.53 +/- 0.03 and 0.40 +/- 0.02 for UPDRS-gait and SAS-gait, respectively. Pre-trained model and demo code for this work is available: https://github.com/TaatiTeam/stgcn_parkinsonism_prediction.

[9]  arXiv:2105.03480 [pdf, other]
Title: A semigroup method for high dimensional elliptic PDEs and eigenvalue problems based on neural networks
Authors: Haoya Li, Lexing Ying
Comments: 13 pages, 15 figures
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)

In this paper, we propose a semigroup method for solving high-dimensional elliptic partial differential equations (PDEs) and the associated eigenvalue problems based on neural networks. For the PDE problems, we reformulate the original equations as variational problems with the help of semigroup operators and then solve the variational problems with neural network (NN) parameterization. The main advantages are that no mixed second-order derivative computation is needed during the stochastic gradient descent training and that the boundary conditions are taken into account automatically by the semigroup operator. For eigenvalue problems, a primal-dual method is proposed, resolving the constraint with a scalar dual variable. Numerical results are provided to demonstrate the performance of the proposed methods.

[10]  arXiv:2105.03482 [pdf, other]
Title: Measuring and Increasing Context Usage in Context-Aware Machine Translation
Comments: ACL 2021
Subjects: Computation and Language (cs.CL)

Recent work in neural machine translation has demonstrated both the necessity and feasibility of using inter-sentential context -- context from sentences other than those currently being translated. However, while many current methods present model architectures that theoretically can use this extra context, it is often not clear how much they do actually utilize it at translation time. In this paper, we introduce a new metric, conditional cross-mutual information, to quantify the usage of context by these models. Using this metric, we measure how much document-level machine translation systems use particular varieties of context. We find that target context is referenced more than source context, and that conditioning on a longer context has a diminishing effect on results. We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models. Experiments show that our method increases context usage and that this reflects on the translation quality according to metrics such as BLEU and COMET, as well as performance on anaphoric pronoun resolution and lexical cohesion contrastive datasets.

[11]  arXiv:2105.03484 [pdf, other]
Title: Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality
Comments: 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)
Subjects: Computation and Language (cs.CL)

In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that write longer texts. Finally, we observe that a majority of the tasks achieve results comparable to the best performance with just $\frac{1}{12}$ of the embedding dimensions.

[12]  arXiv:2105.03489 [pdf, other]
Title: Reinforcement Learning and Control of a Lower Extremity Exoskeleton for Squat Assistance
Subjects: Robotics (cs.RO)

A significant challenge for the control of a robotic lower extremity rehabilitation exoskeleton is to ensure stability and robustness during programmed tasks or motions, which is crucial for the safety of the mobility-impaired user. Due to various levels of the user's disability, the human-exoskeleton interaction forces and external perturbations are unpredictable and could vary substantially and cause conventional motion controllers to behave unreliably or the robot to fall down. In this work, we propose a new, reinforcement learning-based, motion controller for a lower extremity rehabilitation exoskeleton, aiming to perform collaborative squatting exercises with efficiency, stability, and strong robustness. Unlike most existing rehabilitation exoskeletons, our exoskeleton has ankle actuation on both sagittal and front planes and is equipped with multiple foot force sensors to estimate center of pressure (CoP), an important indicator of system balance. This proposed motion controller takes advantage of the CoP information by incorporating it in the state input of the control policy network and adding it to the reward during the learning to maintain a well balanced system state during motions. In addition, we use dynamics randomization and adversary force perturbations including large human interaction forces during the training to further improve control robustness. To evaluate the effectiveness of the learning controller, we conduct numerical experiments with different settings to demonstrate its remarkable ability on controlling the exoskeleton to repetitively perform well balanced and robust squatting motions under strong perturbations and realistic human interaction forces.

[13]  arXiv:2105.03491 [pdf, other]
Title: Uniform Convergence, Adversarial Spheres and a Simple Remedy
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Previous work has cast doubt on the general framework of uniform convergence and its ability to explain generalization in neural networks. By considering a specific dataset, it was observed that a neural network completely misclassifies a projection of the training data (adversarial set), rendering any existing generalization bound based on uniform convergence vacuous. We provide an extensive theoretical investigation of the previously studied data setting through the lens of infinitely-wide models. We prove that the Neural Tangent Kernel (NTK) also suffers from the same phenomenon and we uncover its origin. We highlight the important role of the output bias and show theoretically as well as empirically how a sensible choice completely mitigates the problem. We identify sharp phase transitions in the accuracy on the adversarial set and study its dependency on the training sample size. As a result, we are able to characterize critical sample sizes beyond which the effect disappears. Moreover, we study decompositions of a neural network into a clean and noisy part by considering its canonical decomposition into its different eigenfunctions and show empirically that for too small bias the adversarial phenomenon still persists.

[14]  arXiv:2105.03492 [pdf, other]
Title: Human-Aided Saliency Maps Improve Generalization of Deep Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning has driven remarkable accuracy increases in many computer vision problems. One ongoing challenge is how to achieve the greatest accuracy in cases where training data is limited. A second ongoing challenge is that trained models are sometimes fragile in the sense that the accuracy achieved does not generalize well, even to new data that is subjectively similar to the training set. We address these challenges in a novel way, with the first-ever (to our knowledge) exploration of encoding human judgement about salient regions of images into the training data. We compare the accuracy and generalization of a state-of-the-art deep learning algorithm for a difficult problem in biometric presentation attack detection when trained on (a) original images with typical data augmentations, and (b) the same original images transformed to encode human judgement about salient image regions. The latter approach results in models that achieve higher accuracy and better generalization, decreasing the error of the LivDet-Iris 2020 winner from 29.78% to 16.37%, and achieving impressive generalization in a leave-one-attack-type-out evaluation scenario. This work opens a new area of study for how to embed human intelligence into training strategies for deep learning to achieve high accuracy and generalization in cases of limited training data.

[15]  arXiv:2105.03494 [pdf, other]
Title: The iWildCam 2021 Competition Dataset
Comments: FGVC8 Workshop at CVPR 2021. arXiv admin note: substantial text overlap with arXiv:2004.10340
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Camera traps enable the automatic collection of large quantities of image data. Ecologists use camera traps to monitor animal populations all over the world. In order to estimate the abundance of a species from camera trap data, ecologists need to know not just which species were seen, but also how many individuals of each species were seen. Object detection techniques can be used to find the number of individuals in each image. However, since camera traps collect images in motion-triggered bursts, simply adding up the number of detections over all frames is likely to lead to an incorrect estimate. Overcoming these obstacles may require incorporating spatio-temporal reasoning or individual re-identification in addition to traditional species detection and classification.
We have prepared a challenge where the training data and test data are from different cameras spread across the globe. The set of species seen in each camera overlap, but are not identical. The challenge is to classify species and count individual animals across sequences in the test cameras.

[16]  arXiv:2105.03495 [pdf, other]
Title: Is Incoherence Surprising? Targeted Evaluation of Coherence Prediction from Language Models
Comments: Accepted as long paper at NAACL 2021
Subjects: Computation and Language (cs.CL)

Coherent discourse is distinguished from a mere collection of utterances by the satisfaction of a diverse set of constraints, for example choice of expression, logical relation between denoted events, and implicit compatibility with world-knowledge. Do neural language models encode such constraints? We design an extendable set of test suites addressing different aspects of discourse and dialogue coherence. Unlike most previous coherence evaluation studies, we address specific linguistic devices beyond sentence order perturbations, allowing for a more fine-grained analysis of what constitutes coherence and what neural models trained on a language modelling objective do encode. Extending the targeted evaluation paradigm for neural language models (Marvin and Linzen, 2018) to phenomena beyond syntax, we show that this paradigm is equally suited to evaluate linguistic qualities that contribute to the notion of coherence.

[17]  arXiv:2105.03500 [pdf, other]
Title: A Convergent Finite Difference Method for Optimal Transport on the Sphere
Comments: 34 pages, 21 figures
Subjects: Numerical Analysis (math.NA)

We introduce a convergent finite difference method for solving the optimal transportation problem on the sphere. The method applies to both the traditional squared geodesic cost (arising in mesh generation) and a logarithmic cost (arising in the reflector antenna design problem). At each point on the sphere, we replace the surface PDE with a Generated Jacobian equation posed on the local tangent plane using geodesic normal coordinates. The discretization is inspired by recent monotone methods for the Monge-Amp\`ere equation, but requires significant adaptations in order to correctly handle the mix of gradient and Hessian terms appearing inside the nonlinear determinant operator, as well as the singular logarithmic cost function. Numerical results demonstrate the success of this method on a wide range of challenging problems involving both the squared geodesic and the logarithmic cost functions.

[18]  arXiv:2105.03502 [pdf, other]
Title: Conversational Code Analysis: The Future of Secure Coding
Journal-ref: Pending Review at IntechOpen 2021
Subjects: Cryptography and Security (cs.CR)

The area of software development and secure coding can benefit significantly from advancements in virtual assistants. Research has shown that many coders neglect security in favor of meeting deadlines. This shortcoming leaves systems vulnerable to attackers. While a plethora of tools are available for programmers to scan their code for vulnerabilities, finding the right tool can be challenging. It is therefore imperative to adopt measures to get programmers to utilize code analysis tools that will help them produce more secure code. This chapter looks at the limitations of existing approaches to secure coding and proposes a methodology that allows programmers to scan and fix vulnerabilities in program code by communicating with virtual assistants on their smart devices. With the ubiquitous move towards virtual assistants, it is important to design systems that are more reliant on voice than on standard point-and-click and keyboard-driven approaches. Consequently, we propose MyCodeAnalyzer, a Google Assistant app and code analysis framework, which was designed to interactively scan program code for vulnerabilities and flaws using voice commands during development. We describe the proposed methodology, implement a prototype, test it on a vulnerable project and present our results.

[19]  arXiv:2105.03503 [pdf, ps, other]
Title: The Consolation of Network Coding and Partial Protection Techniques to Optical Transport Networks in Data, Data, Data Era
Authors: Dao Thanh Hai
Comments: 5 pages, 5 figures, 2 tables, submitted to a conference
Subjects: Networking and Internet Architecture (cs.NI)

The age of acceleration is taking place, driven by the revolutionary digital transformation creating basically a digital version of our physical world and the currency in that digital space is data. Massive amount of data has been generated ranging from wearable devices monitoring our physical health every single millisecond to autonomous vehicles generating roughly 5Tb hourly to even astronomical activities producing an order of Exabytes on daily basis and then ultra-broadband Internet comes into play, moving such data to the cloud. Internet traffic therefore has been experiencing explosive growth and in this context, optical transport networks forming the backbone of the Internet are pushed for transformation in system capacity. While the intuitive solution of deploying multiple fibers can address the pressing demand for increased capacity, doing so does not bring improvement in economic of scales in terms of cost, power consumption and spectral efficiency. This necessitates for a different approach so that the fiber capacity could be utilized in a more efficient manner. In this paper, we focus on innovative techniques, that is, network coding and partial protection, to reduce the effective traffic load in order to achieve greater capacity efficiency for optical transport networks. Specifically, the application of network coding is examined by upgrading the functionalities of intermediate nodes with processing (i.e., encoding and decoding) capabilities. Besides, partial protection relying on the premise of providing just enough bandwidth in case of failure events is investigated for saving the redundant protection capacity. What is more interesting arises when combining both network coding and partial protection and we present insights on how to derive compounding gains in such unique prospect.

[20]  arXiv:2105.03505 [pdf, other]
Title: Unsupervised Cross-Domain Prerequisite Chain Learning using Variational Graph Autoencoders
Comments: Short paper Accepted by ACL 2021
Subjects: Computation and Language (cs.CL)

Learning prerequisite chains is an essential task for efficiently acquiring knowledge in both known and unknown domains. For example, one may be an expert in the natural language processing (NLP) domain but want to determine the best order to learn new concepts in an unfamiliar Computer Vision domain (CV). Both domains share some common concepts, such as machine learning basics and deep learning models. In this paper, we propose unsupervised cross-domain concept prerequisite chain learning using an optimized variational graph autoencoder. Our model learns to transfer concept prerequisite relations from an information-rich domain (source domain) to an information-poor domain (target domain), substantially surpassing other baseline models. Also, we expand an existing dataset by introducing two new domains: CV and Bioinformatics (BIO). The annotated data and resources, as well as the code, will be made publicly available.

[21]  arXiv:2105.03509 [pdf]
Title: Wyner wiretap-like encoding scheme for cyber-physical systems
Journal-ref: IET Cyber-Physical Systems: Theory & Applications, Vol. 5, No. 4, pp. 359-365, 2020
Subjects: Systems and Control (eess.SY); Cryptography and Security (cs.CR)

In this study, the authors consider the problem of exchanging secrete messages in cyber-physical systems (CPSs) without resorting to cryptographic solutions. In particular, they consider a CPS where the networked controller wants to send a secrete message to the plant. They show that such a problem can be solved by exploiting a Wyner wiretap-like encoding scheme taking advantage of the closed-loop operations typical of feedback control systems. Specifically, by resorting to the control concept of one-step reachable sets, they first show that a wiretap-like encoding scheme exists whenever there is an asymmetry in the plant model knowledge available to control system (the defender) and to the eavesdropper. The effectiveness of the proposed scheme is confirmed by means of a numerical example. Finally, they conclude the study by presenting open design challenges that can be addressed by the research community to improve, in different directions, the secrete message exchange problem in CPSs

[22]  arXiv:2105.03517 [pdf]
Title: Applicability of overlay non-delay tolerant position-based protocols in highways and urban environments for vanet
Comments: 16 pages
Subjects: Networking and Internet Architecture (cs.NI)

Vehicular Ad hoc Network (VANET) is a new sort of wireless ad-hoc network. Vehicle-to-Vehicle (V2V) communication is one of the main communication paradigms that provide a level of safety and convenience to drivers and passengers on the road. In such an environment, routing data packets is challenging due to frequent changes of network topology because of the highly dynamic nature of vehicles. Thus, routing in VANETs requires efficient protocols that guarantee message transmission among vehicles. Numerous routing protocols and algorithms have been proposed or enhanced to solve the aforementioned problems. Many position-based routing protocols have been developed for routing messages that have been identified to be appropriate for VANETs. This work explores the performances of selected unicast non-delay tolerant overlay position-based routing protocols. The evaluation has been conducted in highway and urban environments in two different scenarios. The evaluation metrics that are used are Packet Delivery Ratio (PDR), Void Problem Occurrence (VPO), and Average Hop Count (AHC).

[23]  arXiv:2105.03519 [pdf, other]
Title: Understanding by Understanding Not: Modeling Negation in Language Models
Subjects: Computation and Language (cs.CL)

Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language models often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the resulting combined objective we reduce the mean top~1 error rate to 4% on the negated LAMA dataset. We also see some improvements on the negated NLI benchmarks.

[24]  arXiv:2105.03521 [pdf, other]
Title: Stochastic Properties of EIP-1559 Basefees
Subjects: Computer Science and Game Theory (cs.GT); Cryptography and Security (cs.CR)

EIP-1559 is a new proposed pricing mechanism for the Ethereum protocol developed to bring stability to fluctuating gas prices. To properly understand this as a stochastic process, it is necessary to develop the mathematical foundations to understand under what conditions the base fee gas price outcomes behave as a stationary process, and when it does not. Understanding these mathematical fundamentals is critical to properly engineering a stable system.

[25]  arXiv:2105.03522 [pdf, other]
Title: On Abstract Machine Semantics for Proto-Quipper-M
Authors: Andrea Colledan
Comments: 72 pages (34 without appendix), 5 figures
Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO); Quantum Physics (quant-ph)

Quipper is a domain-specific programming language for the description of quantum circuits. Because it is implemented as an embedded language in Haskell, Quipper is a very practical functional language. However, for the same reason, it lacks a formal semantics and it is limited by Haskell's type system. In particular, because Haskell lacks linear types, it is easy to write Quipper programs that violate the non-cloning property of quantum states. In order to formalize relevant fragments of Quipper in a type-safe way, the Proto-Quipper family of research languages has been introduced over the last years. In this paper we first review Proto-Quipper-M, an instance of the Proto-Quipper family based on a categorical model for quantum circuits, which features a linear type system that guarantees that the non-cloning property holds at compile time. We then derive a tentative small-step operational semantics from the big-step semantics of Proto-Quipper-M and we prove that the two are equivalent. After proving subject reduction and progress results for the tentative semantics, we build upon it to obtain a truly small-step semantics in the style of an abstract machine, which we eventually prove to be equivalent to the original semantics.

[26]  arXiv:2105.03523 [pdf, other]
Title: Test Suites as a Source of Training Data for Static Analysis Alert Classifiers
Comments: 9 pages, 3 figures, 6 tables, to be published in proceedings of Conference on Automation of Software Test (AST 2021)
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)

Flaw-finding static analysis tools typically generate large volumes of code flaw alerts including many false positives. To save on human effort to triage these alerts, a significant body of work attempts to use machine learning to classify and prioritize alerts. Identifying a useful set of training data, however, remains a fundamental challenge in developing such classifiers in many contexts. We propose using static analysis test suites (i.e., repositories of "benchmark" programs that are purpose-built to test coverage and precision of static analysis tools) as a novel source of training data. In a case study, we generated a large quantity of alerts by executing various static analyzers on the Juliet C/C++ test suite, and we automatically derived ground truth labels for these alerts by referencing the Juliet test suite metadata. Finally, we used this data to train classifiers to predict whether an alert is a false positive. Our classifiers obtained high precision (90.2%) and recall (88.2%) for a large number of code flaw types on a hold-out test set. This preliminary result suggests that pre-training classifiers on test suite data could help to jumpstart static analysis alert classification in data-limited contexts.

[27]  arXiv:2105.03531 [pdf, other]
Title: On the Complexity of Verification of Time-Sensitive Distributed Systems: Technical Report
Comments: arXiv admin note: text overlap with arXiv:1606.07886
Subjects: Computational Complexity (cs.CC)

Time-Sensitive Distributed Systems (TSDS), such as applications using autonomous drones, achieve goals under possible environment interference (e.g., winds). Goals are often specified using explicit time constraints, and, moreover, goals must be satisfied by the system perpetually. For example, drones carrying out the surveillance of some area must always have recent pictures, i.e., at most M time units old, of some strategic locations. This paper proposes a Multiset Rewriting language with explicit time for specifying and analyzing TSDSes. We introduce new properties, such as realizability (there exists a good trace), survivability (where, in addition, all admissible traces are good), recoverability (all compliant traces do not reach points-of-no-return), and reliability (system can always continue functioning using a good trace). A good trace is an infinite trace in which goals are perpetually satisfied. We propose a class of systems called Progressing Timed Systems (PTS), where intuitively only a finite number of actions can be carried out in a bounded time period. We prove that for this class of systems the problems of realizability, recoverability, reliability, and survivability are PSPACE-complete. Furthermore, if we impose a bound on time (as in bounded model-checking), we show that for PTS, realizability becomes NP-complete, while survivability and reliability problems are in the $\Delta_2^p$ class of the polynomial hierarchy.

[28]  arXiv:2105.03533 [pdf, other]
Title: Video Class Agnostic Segmentation with Contrastive Learningfor Autonomous Driving
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Semantic segmentation in autonomous driving predominantly focuses on learning from large-scale data with a closed set of known classes without considering unknown objects. Motivated by safety reasons, we address the video class agnostic segmentation task, which considers unknown objects outside the closed set of known classes in our training data. We propose a novel auxiliary contrastive loss to learn the segmentation of known classes and unknown objects. Unlike previous work in contrastive learning that samples the anchor, positive and negative examples on an image level, our contrastive learning method leverages pixel-wise semantic and temporal guidance. We conduct experiments on Cityscapes-VPS by withholding four classes from training and show an improvement gain for both known and unknown objects segmentation with the auxiliary contrastive loss. We further release a large-scale synthetic dataset for different autonomous driving scenarios that includes distinct and rare unknown objects. We conduct experiments on the full synthetic dataset and a reduced small-scale version, and show how contrastive learning is more effective in small scale datasets. Our proposed models, dataset, and code will be released at https://github.com/MSiam/video_class_agnostic_segmentation.

[29]  arXiv:2105.03534 [pdf, other]
Title: SimJEB: Simulated Jet Engine Bracket Dataset
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

Recent advancements in geometric deep learning have enabled a new class of engineering surrogate models; however, few existing shape datasets are well-suited to evaluate them. This paper introduces the Simulated Jet Engine Bracket Dataset (SimJEB): a new, public collection of crowdsourced mechanical brackets and high-fidelity structural simulations designed specifically for surrogate modeling. SimJEB models are more complex, diverse, and realistic than the synthetically generated datasets commonly used in parametric surrogate model evaluation. In contrast to existing engineering shape collections, SimJEB's models are all designed for the same engineering function and thus have consistent structural loads and support conditions. The models in SimJEB were collected from the original submissions to the GrabCAD Jet Engine Bracket Challenge: an open engineering design competition with over 700 hand-designed CAD entries from 320 designers representing 56 countries. Each model has been cleaned, categorized, meshed, and simulated with finite element analysis according to the original competition specifications. The result is a collection of diverse, high-quality and application-focused designs for advancing geometric deep learning and engineering surrogate models.

[30]  arXiv:2105.03536 [pdf, other]
Title: Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Comments: 8 pages. Accepted at the Efficient Deep Learning for Computer Vision Workshop at CVPR 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Quantization has become a popular technique to compress neural networks and reduce compute cost, but most prior work focuses on studying quantization without changing the network size. Many real-world applications of neural networks have compute cost and memory budgets, which can be traded off with model quality by changing the number of parameters. In this work, we use ResNet as a case study to systematically investigate the effects of quantization on inference compute cost-quality tradeoff curves. Our results suggest that for each bfloat16 ResNet model, there are quantized models with lower cost and higher accuracy; in other words, the bfloat16 compute cost-quality tradeoff curve is Pareto-dominated by the 4-bit and 8-bit curves, with models primarily quantized to 4-bit yielding the best Pareto curve. Furthermore, we achieve state-of-the-art results on ImageNet for 4-bit ResNet-50 with quantization-aware training, obtaining a top-1 eval accuracy of 77.09%. We demonstrate the regularizing effect of quantization by measuring the generalization gap. The quantization method we used is optimized for practicality: It requires little tuning and is designed with hardware capabilities in mind. Our work motivates further research into optimal numeric formats for quantization, as well as the development of machine learning accelerators supporting these formats. As part of this work, we contribute a quantization library written in JAX, which is open-sourced at https://github.com/google-research/google-research/tree/master/aqt.

[31]  arXiv:2105.03540 [pdf, other]
Title: An Intelligent Model for Solving Manpower Scheduling Problems
Comments: none
Subjects: Artificial Intelligence (cs.AI)

The manpower scheduling problem is a critical research field in the resource management area. Based on the existing studies on scheduling problem solutions, this paper transforms the manpower scheduling problem into a combinational optimization problem under multi-constraint conditions from a new perspective. It also uses logical paradigms to build a mathematical model for problem solution and an improved multi-dimensional evolution algorithm for solving the model. Moreover, the constraints discussed in this paper basically cover all the requirements of human resource coordination in modern society and are supported by our experiment results. In the discussion part, we compare our model with other heuristic algorithms or linear programming methods and prove that the model proposed in this paper makes a 25.7% increase in efficiency and a 17% increase in accuracy at most. In addition, to the numerical solution of the manpower scheduling problem, this paper also studies the algorithm for scheduling task list generation and the method of displaying scheduling results. As a result, we not only provide various modifications for the basic algorithm to solve different condition problems but also propose a new algorithm that increases at least 28.91% in time efficiency by comparing with different baseline models.

[32]  arXiv:2105.03541 [pdf, other]
Title: Apply Artificial Neural Network to Solving Manpower Scheduling Problem
Comments: none
Journal-ref: BDAI 2021
Subjects: Machine Learning (cs.LG)

The manpower scheduling problem is a kind of critical combinational optimization problem. Researching solutions to scheduling problems can improve the efficiency of companies, hospitals, and other work units. This paper proposes a new model combined with deep learning to solve the multi-shift manpower scheduling problem based on the existing research. This model first solves the objective function's optimized value according to the current constraints to find the plan of employee arrangement initially. It will then use the scheduling table generation algorithm to obtain the scheduling result in a short time. Moreover, the most prominent feature we propose is that we will use the neural network training method based on the time series to solve long-term and long-period scheduling tasks and obtain manpower arrangement. The selection criteria of the neural network and the training process are also described in this paper. We demonstrate that our model can make a precise forecast based on the improvement of neural networks. This paper also discusses the challenges in the neural network training process and obtains enlightening results after getting the arrangement plan. Our research shows that neural networks and deep learning strategies have the potential to solve similar problems effectively.

[33]  arXiv:2105.03545 [pdf, other]
Title: The Pony Express Communication Problem
Comments: 14 pages, 3 figures to be published in IWOCA 2021
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM)

We introduce a new problem which we call the Pony Express problem. n robots with differing speeds are situated over some domain. A message is placed at some commonly known point. Robots can acquire the message either by visiting its initial position, or by encountering another robot that has already acquired it. The robots must collaborate to deliver the message to a given destination. The objective is to deliver the message in minimum time. In this paper we study the Pony Express problem on the line where n robots are arbitrarily deployed along a finite segment. The robots have different speeds and can move in both directions. We are interested in both offline centralized and online distributed algorithms. In the online case, we assume the robots have limited knowledge of the initial configuration. In particular, the robots do not know the initial positions and speeds of the other robots nor even their own position and speed. They do, however, know the direction on the line in which to find the message and have the ability to compare speeds when they meet.
First, we study the Pony Express problem where the message is initially placed at one endpoint of a segment and must be delivered to the other endpoint. We provide an O(n log n) running time offline algorithm as well as an optimal online algorithm. Then we study the Half-Broadcast problem where the message is at the center and must be delivered to either one of the endpoints of the segment [-1,1]. We provide an offline algorithm running in O(n^2 log n) time and we provide an online algorithm that attains a competitive ratio of 3/2 which we show is the best possible. Finally, we study the Broadcast problem where the message is at the center and must be delivered to both endpoints of the segment [-1,1]. Here we give an FPTAS in the offline case and an online algorithm that attains a competitive ratio of 9/5, which we show is tight.

[34]  arXiv:2105.03546 [pdf, other]
Title: Scalable, Decentralized Multi-Agent Reinforcement Learning Methods Inspired by Stigmergy and Ant Colonies
Comments: 50 pages, 40 figures
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

Bolstering multi-agent learning algorithms to tackle complex coordination and control tasks has been a long-standing challenge of on-going research. Numerous methods have been proposed to help reduce the effects of non-stationarity and unscalability. In this work, we investigate a novel approach to decentralized multi-agent learning and planning that attempts to address these two challenges. In particular, this method is inspired by the cohesion, coordination, and behavior of ant colonies. As a result, these algorithms are designed to be naturally scalable to systems with numerous agents. While no optimality is guaranteed, the method is intended to work well in practice and scale better in efficacy with the number of agents present than others. The approach combines single-agent RL and an ant-colony-inspired decentralized, stigmergic algorithm for multi-agent path planning and environment modification. Specifically, we apply this algorithm in a setting where agents must navigate to a goal location, learning to push rectangular boxes into holes to yield new traversable pathways. It is shown that while the approach yields promising success in this particular environment, it may not be as easily generalized to others. The algorithm designed is notably scalable to numerous agents but is limited in its performance due to its relatively simplistic, rule-based approach. Furthermore, the composability of RL-trained policies is called into question, where, while policies are successful in their training environments, applying trained policies to a larger-scale, multi-agent framework results in unpredictable behavior.

[35]  arXiv:2105.03552 [pdf, other]
Title: Solving social dilemmas by reasoning about expectations
Subjects: Multiagent Systems (cs.MA)

It has been argued that one role of social constructs, such as institutions, trust and norms, is to coordinate the expectations of autonomous entities in order to resolve collective action situations (such as collective risk dilemmas) through the coordination of behaviour. While much work has addressed the formal representation of these social constructs, in this paper we focus specifically on the formal representation of, and associated reasoning with, the expectations themselves. In particular, we investigate how explicit reasoning about expectations can be used to encode both traditional game theory solution concepts and social mechanisms for the social dilemma situation. We use the Collective Action Simulation Platform (CASP) to model a collective risk dilemma based on a flood plain scenario and show how using expectations in the reasoning mechanisms of the agents making decisions supports the choice of cooperative behaviour.

[36]  arXiv:2105.03559 [pdf, ps, other]
Title: Applications of Auction and Mechanism Design in Edge Computing: A Survey
Subjects: Computer Science and Game Theory (cs.GT); Distributed, Parallel, and Cluster Computing (cs.DC)

Edge computing as a promising technology provides lower latency, more efficient transmission, and faster speed of data processing since the edge servers are closer to the user devices. Each edge server with limited resources can offload latency-sensitive and computation-intensive tasks from nearby user devices. However, edge computing faces challenges such as resource allocation, energy consumption, security and privacy issues, etc. Auction mechanisms can well characterize bidirectional interactions between edge servers and user devices under the above constraints in edge computing. As demonstrated by the existing works, auction and mechanism design approaches are outstanding on achieving optimal allocation strategy while guaranteeing mutual satisfaction among edge servers and user devices, especially for scenarios with scarce resources. In this paper, we introduce a comprehensive survey of recent researches that apply auction approaches in edge computing. Firstly, a brief overview of edge computing including three common edge computing paradigms, i.e., cloudlet, fog computing and mobile edge computing, is presented. Then, we introduce fundamentals and backgrounds of auction schemes commonly used in edge computing systems. After then, a comprehensive survey of applications of auction-based approaches applied for edge computing is provided, which is categorized by different auction approaches. Finally, several open challenges and promising research directions are discussed.

[37]  arXiv:2105.03560 [pdf, other]
Title: Error analysis of an unfitted HDG method for a class of non-linear elliptic problems
Subjects: Numerical Analysis (math.NA)

We study Hibridizable Discontinuous Galerkin (HDG) discretizations for a class of non-linear interior elliptic boundary value problems posed in curved domains where both the source term and the diffusion coefficient are non-linear. We consider the cases where the non-linear diffusion coefficient depends on the solution and on the gradient of the solution. To sidestep the need for curved elements, the discrete solution is computed on a polygonal subdomain that is not assumed to interpolate the true boundary, giving rise to an unfitted computational mesh. We show that, under mild assumptions on the source term and the computational domain, the discrete systems are well posed. Furthermore, we provide a priori error estimates showing that the discrete solution will have optimal order of convergence as long as the distance between the curved boundary and the computational boundary remains of the same order of magnitude as the mesh parameter.

[38]  arXiv:2105.03564 [pdf, other]
Title: $E^2Coop$: Energy Efficient and Cooperative Obstacle Detection and Avoidance for UAV Swarms
Comments: The 31st International Conference on Automated Planning and Scheduling 2021
Subjects: Robotics (cs.RO)

Energy efficiency is of critical importance to trajectory planning for UAV swarms in obstacle avoidance. In this paper, we present $E^2Coop$, a new scheme designed to avoid collisions for UAV swarms by tightly coupling Artificial Potential Field (APF) with Particle Swarm Planning (PSO) based trajectory planning. In $E^2Coop$, swarm members perform trajectory planning cooperatively to avoid collisions in an energy-efficient manner. $E^2Coop$ exploits the advantages of the active contour model in image processing for trajectory planning. Each swarm member plans its trajectories on the contours of the environment field to save energy and avoid collisions to obstacles. Swarm members that fall within the safeguard distance of each other plan their trajectories on different contours to avoid collisions with each other. Simulation results demonstrate that $E^2Coop$ can save energy up to 51\% compared with two state-of-the-art schemes.

[39]  arXiv:2105.03567 [pdf, other]
Title: Multimodal and Contrastive Learning for Click Fraud Detection
Comments: Accepted to DeMal@WWW 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Advertising click fraud detection plays one of the vital roles in current E-commerce websites as advertising is an essential component of its business model. It aims at, given a set of corresponding features, e.g., demographic information of users and statistical features of clicks, predicting whether a click is fraudulent or not in the community. Recent efforts attempted to incorporate attributed behavior sequence and heterogeneous network for extracting complex features of users and achieved significant effects on click fraud detection. In this paper, we propose a Multimodal and Contrastive learning network for Click Fraud detection (MCCF). Specifically, motivated by the observations on differences of demographic information, behavior sequences and media relationship between fraudsters and genuine users on E-commerce platform, MCCF jointly utilizes wide and deep features, behavior sequence and heterogeneous network to distill click representations. Moreover, these three modules are integrated by contrastive learning and collaboratively contribute to the final predictions. With the real-world datasets containing 2.54 million clicks on Alibaba platform, we investigate the effectiveness of MCCF. The experimental results show that the proposed approach is able to improve AUC by 7.2% and F1-score by 15.6%, compared with the state-of-the-art methods.

[40]  arXiv:2105.03569 [pdf, other]
Title: Improving Robustness for Pose Estimation via Stable Heatmap Regression
Comments: 10 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning methods have achieved excellent performance in pose estimation, but the lack of robustness causes the keypoints to change drastically between similar images. In view of this problem, a stable heatmap regression method is proposed to alleviate network vulnerability to small perturbations. We utilize the correlation between different rows and columns in a heatmap to alleviate the multi-peaks problem, and design a highly differentiated heatmap regression to make a keypoint discriminative from surrounding points. A maximum stability training loss is used to simplify the optimization difficulty when minimizing the prediction gap of two similar images. The proposed method achieves a significant advance in robustness over state-of-the-art approaches on two benchmark datasets and maintains high performance.

[41]  arXiv:2105.03570 [pdf, other]
Title: Domain-Specific Suppression for Adaptive Object Detection
Comments: Accepted in CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Domain adaptation methods face performance degradation in object detection, as the complexity of tasks require more about the transferability of the model. We propose a new perspective on how CNN models gain the transferability, viewing the weights of a model as a series of motion patterns. The directions of weights, and the gradients, can be divided into domain-specific and domain-invariant parts, and the goal of domain adaptation is to concentrate on the domain-invariant direction while eliminating the disturbance from domain-specific one. Current UDA object detection methods view the two directions as a whole while optimizing, which will cause domain-invariant direction mismatch even if the output features are perfectly aligned. In this paper, we propose the domain-specific suppression, an exemplary and generalizable constraint to the original convolution gradients in backpropagation to detach the two parts of directions and suppress the domain-specific one. We further validate our theoretical analysis and methods on several domain adaptive object detection tasks, including weather, camera configuration, and synthetic to real-world adaptation. Our experiment results show significant advance over the state-of-the-art methods in the UDA object detection field, performing a promotion of $10.2\sim12.2\%$ mAP on all these domain adaptation scenarios.

[42]  arXiv:2105.03571 [pdf, other]
Title: Comprehensive Study: How the Context Information of Different Granularity Affects Dialogue State Tracking?
Comments: Accepted as long paper at main conference of ACL 2021
Subjects: Computation and Language (cs.CL)

Dialogue state tracking (DST) plays a key role in task-oriented dialogue systems to monitor the user's goal. In general, there are two strategies to track a dialogue state: predicting it from scratch and updating it from previous state. The scratch-based strategy obtains each slot value by inquiring all the dialogue history, and the previous-based strategy relies on the current turn dialogue to update the previous dialogue state. However, it is hard for the scratch-based strategy to correctly track short-dependency dialogue state because of noise; meanwhile, the previous-based strategy is not very useful for long-dependency dialogue state tracking. Obviously, it plays different roles for the context information of different granularity to track different kinds of dialogue states. Thus, in this paper, we will study and discuss how the context information of different granularity affects dialogue state tracking. First, we explore how greatly different granularities affect dialogue state tracking. Then, we further discuss how to combine multiple granularities for dialogue state tracking. Finally, we apply the findings about context granularity to few-shot learning scenario. Besides, we have publicly released all codes\footnote{\url{https://anonymous}}.

[43]  arXiv:2105.03572 [pdf, other]
Title: Blockchain Systems, Technologies and Applications: A Methodology Perspective
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

In the past decade, blockchain has shown a promising vision greatly to build the trust without any powerful third party in a secure, decentralized and salable manner. However, due to the wide application and future development from cryptocurrency to Internet of Things, blockchain is an extremely complex system enabling integration with mathematics, finance, computer science, communication and network engineering, etc. As a result, it is a challenge for engineer, expert and researcher to fully understand the blockchain process in a systematic view from top to down. First, this article introduces how blockchain works, the research activity and challenge, and illustrates the roadmap involving the classic methodology with typical blockchain use cases and topics. Second, in blockchain system, how to adopt stochastic process, game theory, optimization, machine learning and cryptography to study blockchain running process and design blockchain protocol/algorithm are discussed in details. Moreover, the advantage and limitation using these methods are also summarized as the guide of future work to further considered. Finally, some remaining problems from technical, commercial and political views are discussed as the open issues. The main findings of this article will provide an overview in a methodology perspective to study theoretical model for blockchain fundamentals understanding, design network service for blockchain-based mechanisms and algorithms, as well as apply blockchain for Internet of Things, etc.

[44]  arXiv:2105.03573 [pdf, ps, other]
Title: Survey of Parallel A* in Rust
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

A* is one of the most popular Best First Search (BFS) techniques for graphs. It combines the cost-based search of Breadth First Search with a computed heuristic for each node to attempt to locate the goal path faster than traditional Breadth First Search or Depth First Search techniques. However, A* is a sequential algorithm. The standard implementation only runs in one thread. There are a few attempts to get A* to leverage multiple threads. Centralized (SPA*) and Decentralized (DPA*, HDA*) methods are the most standard attempts, with the most unique and modern method being massively-parallel A* (MPA* or GA*). We will attempt an implementation of each in Rust to determine if there is a performance boost, and which one has the best performance.

[45]  arXiv:2105.03577 [pdf, ps, other]
Title: Joint Beamforming and Reconfigurable Intelligent Surface Design for Two-Way Relay Networks
Subjects: Information Theory (cs.IT)

In this paper, we consider a reconfigurable intelligent surface (RIS)-assisted two-way relay network, in which two users exchange information through the base station (BS) with the help of an RIS. By jointly designing the phase shifts at the RIS and beamforming matrix at the BS, our objective is to maximize the minimum signal-to-noise ratio (SNR) of the two users, under the transmit power constraint at the BS. We first consider the single-antenna BS case, and propose two algorithms to design the RIS phase shifts and the BS power amplification parameter, namely the SNR-upper-bound-maximization (SUM) method, and genetic-SNR-maximization (GSM) method. When there are multiple antennas at the BS, the optimization problem can be approximately addressed by successively solving two decoupled subproblems, one to optimize the RIS phase shifts, the other to optimize the BS beamforming matrix. The first subproblem can be solved by using SUM or GSM method, while the second subproblem can be solved by using optimized beamforming or maximum-ratio-beamforming method. The proposed algorithms have been verified through numerical results with computational complexity analysis.

[46]  arXiv:2105.03578 [pdf, other]
Title: Learning to Predict Repeatability of Interest Points
Comments: Accepted at IEEE International Conference on Robotics and Automation (ICRA) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Many robotics applications require interest points that are highly repeatable under varying viewpoints and lighting conditions. However, this requirement is very challenging as the environment changes continuously and indefinitely, leading to appearance changes of interest points with respect to time. This paper proposes to predict the repeatability of an interest point as a function of time, which can tell us the lifespan of the interest point considering daily or seasonal variation. The repeatability predictor (RP) is formulated as a regressor trained on repeated interest points from multiple viewpoints over a long period of time. Through comprehensive experiments, we demonstrate that our RP can estimate when a new interest point is repeated, and also highlight an insightful analysis about this problem. For further comparison, we apply our RP to the map summarization under visual localization framework, which builds a compact representation of the full context map given the query time. The experimental result shows a careful selection of potentially repeatable interest points predicted by our RP can significantly mitigate the degeneration of localization accuracy from map summarization.

[47]  arXiv:2105.03579 [pdf, other]
Title: Unsupervised Remote Sensing Super-Resolution via Migration Image Prior
Comments: 6 pages, 4 figures. IEEE International Conference on Multimedia and Expo (ICME) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Recently, satellites with high temporal resolution have fostered wide attention in various practical applications. Due to limitations of bandwidth and hardware cost, however, the spatial resolution of such satellites is considerably low, largely limiting their potentials in scenarios that require spatially explicit information. To improve image resolution, numerous approaches based on training low-high resolution pairs have been proposed to address the super-resolution (SR) task. Despite their success, however, low/high spatial resolution pairs are usually difficult to obtain in satellites with a high temporal resolution, making such approaches in SR impractical to use. In this paper, we proposed a new unsupervised learning framework, called "MIP", which achieves SR tasks without low/high resolution image pairs. First, random noise maps are fed into a designed generative adversarial network (GAN) for reconstruction. Then, the proposed method converts the reference image to latent space as the migration image prior. Finally, we update the input noise via an implicit method, and further transfer the texture and structured information from the reference image. Extensive experimental results on the Draper dataset show that MIP achieves significant improvements over state-of-the-art methods both quantitatively and qualitatively. The proposed MIP is open-sourced at this http URL

[48]  arXiv:2105.03581 [pdf, other]
Title: Distortion-Based Outer-Bounds for Channels with Rate-Limited Feedback
Authors: Alireza Vahid
Comments: To be presented at IEEE International Symposium on Information Theory (ISIT) 2021
Subjects: Information Theory (cs.IT)

We present a new technique to obtain outer-bounds on the capacity region of networks with ultra low-rate feedback. We establish a connection between the achievable rates in the forward channel and the minimum distortion that can be attained over the feedback channel.

[49]  arXiv:2105.03582 [pdf, other]
Title: Sign-Agnostic CONet: Learning Implicit Surface Reconstructions by Sign-Agnostic Optimization of Convolutional Occupancy Networks
Comments: 18 pages; 14 figures; 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Surface reconstruction from point clouds is a fundamental problem in the computer vision and graphics community. Recent state-of-the-arts solve this problem by individually optimizing each local implicit field during inference. Without considering the geometric relationships between local fields, they typically require accurate normals to avoid the sign conflict problem in overlapping regions of local fields, which severely limits their applicability to raw scans where surface normals could be unavailable. Although SAL breaks this limitation via sign-agnostic learning, it is still unexplored that how to extend this pipeline to local shape modeling. To this end, we propose to learn implicit surface reconstruction by sign-agnostic optimization of convolutional occupancy networks, to simultaneously achieve advanced scalability, generality, and applicability in a unified framework. In the paper, we also show this goal can be effectively achieved by a simple yet effective design, which optimizes the occupancy fields that are conditioned on convolutional features from an hourglass network architecture with an unsigned binary cross-entropy loss. Extensive experimental comparison with previous state-of-the-arts on both object-level and scene-level datasets demonstrate the superior accuracy of our approach for surface reconstruction from un-orientated point clouds.

[50]  arXiv:2105.03588 [pdf]
Title: Facial Emotion Recognition: State of the Art Performance on FER2013
Comments: 9 pages, 5 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Facial emotion recognition (FER) is significant for human-computer interaction such as clinical practice and behavioral description. Accurate and robust FER by computer models remains challenging due to the heterogeneity of human faces and variations in images such as different facial pose and lighting. Among all techniques for FER, deep learning models, especially Convolutional Neural Networks (CNNs) have shown great potential due to their powerful automatic feature extraction and computational efficiency. In this work, we achieve the highest single-network classification accuracy on the FER2013 dataset. We adopt the VGGNet architecture, rigorously fine-tune its hyperparameters, and experiment with various optimization methods. To our best knowledge, our model achieves state-of-the-art single-network accuracy of 73.28 % on FER2013 without using extra training data.

[51]  arXiv:2105.03589 [pdf, ps, other]
Title: Relay Assisted Underlay Cognitive Radio Networks with Multiple Users
Comments: 7 pages, 4 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this letter, we consider an underlay cognitive radio network assisted by dual-hop decode-and-forward (DF) relaying. For a general multi-user network, we adopt a max-min fairness relay selection scheme and analyse the outage probability when the channels are subject to independent and non-identical Nakagami-m fading. The relay network operates within the constraint imposed on the peak interference power tolerable by the primary receiver. We then analyse the asymptotic outage probability performance and illustrate the existence of i) the full-diversity order when the interference level at the primary user increases proportionally with the relay transmit power; and ii) an outage floor when the transmit powers of the relays are restricted by the primary receiver. We also analyse the outage probability with imperfect channel state information (CSI) and the average throughput over Rayleigh fading channels. Illustrative analytical results are accurately validated by numerical simulations.

[52]  arXiv:2105.03591 [pdf, other]
Title: Loss Tolerant Federated Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

Federated learning has attracted attention in recent years for collaboratively training data on distributed devices with privacy-preservation. The limited network capacity of mobile and IoT devices has been seen as one of the major challenges for cross-device federated learning. Recent solutions have been focusing on threshold-based client selection schemes to guarantee the communication efficiency. However, we find this approach can cause biased client selection and results in deteriorated performance. Moreover, we find that the challenge of network limit may be overstated in some cases and the packet loss is not always harmful. In this paper, we explore the loss tolerant federated learning (LT-FL) in terms of aggregation, fairness, and personalization. We use ThrowRightAway (TRA) to accelerate the data uploading for low-bandwidth-devices by intentionally ignoring some packet losses. The results suggest that, with proper integration, TRA and other algorithms can together guarantee the personalization and fairness performance in the face of packet loss below a certain fraction (10%-30%).

[53]  arXiv:2105.03592 [pdf, other]
Title: De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks
Comments: To be published in IEEE Transactions on Information Forensics and Security
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Performance (cs.PF)

Machine learning techniques have been widely applied to various applications. However, they are potentially vulnerable to data poisoning attacks, where sophisticated attackers can disrupt the learning procedure by injecting a fraction of malicious samples into the training dataset. Existing defense techniques against poisoning attacks are largely attack-specific: they are designed for one specific type of attacks but do not work for other types, mainly due to the distinct principles they follow. Yet few general defense strategies have been developed. In this paper, we propose De-Pois, an attack-agnostic defense against poisoning attacks. The key idea of De-Pois is to train a mimic model the purpose of which is to imitate the behavior of the target model trained by clean samples. We take advantage of Generative Adversarial Networks (GANs) to facilitate informative training data augmentation as well as the mimic model construction. By comparing the prediction differences between the mimic model and the target model, De-Pois is thus able to distinguish the poisoned samples from clean ones, without explicit knowledge of any ML algorithms or types of poisoning attacks. We implement four types of poisoning attacks and evaluate De-Pois with five typical defense methods on different realistic datasets. The results demonstrate that De-Pois is effective and efficient for detecting poisoned data against all the four types of poisoning attacks, with both the accuracy and F1-score over 0.9 on average.

[54]  arXiv:2105.03594 [pdf, ps, other]
Title: Learning stochastic decision trees
Comments: To appear in ICALP 2021
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an $\eta$-corrupted set of uniform random samples labeled by a size-$s$ stochastic decision tree, our algorithm runs in time $n^{O(\log(s/\varepsilon)/\varepsilon^2)}$ and returns a hypothesis with error within an additive $2\eta + \varepsilon$ of the Bayes optimal. An additive $2\eta$ is the information-theoretic minimum.
Previously no non-trivial algorithm with a guarantee of $O(\eta) + \varepsilon$ was known, even for weaker noise models. Our algorithm is furthermore proper, returning a hypothesis that is itself a decision tree; previously no such algorithm was known even in the noiseless setting.

[55]  arXiv:2105.03595 [pdf, other]
Title: HiTyper: A Hybrid Static Type Inference Framework with Neural Prediction
Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)

Type inference for dynamic programming languages is an important yet challenging task. By leveraging the natural language information of existing human annotations, deep neural networks outperform other traditional techniques and become the state-of-the-art (SOTA) in this task. However, they are facing some new challenges, such as fixed type set, type drift, type correctness, and composite type prediction. To mitigate the challenges, in this paper, we propose a hybrid type inference framework named HiTyper, which integrates static inference into deep learning (DL) models for more accurate type prediction. Specifically, HiTyper creates a new syntax graph for each program, called type graph, illustrating the type flow among all variables in the program. Based on the type graph, HiTyper statically infers the types of the variables with appropriate static constraints. HiTyper then adopts a SOTA DL model to predict the types of other variables that cannot be inferred statically, during which process a type correction algorithm is employed to validate and correct the types recommended by the DL model. Extensive experiments show that HiTyper outperforms the SOTA DL approach by 12.7% in terms of top-1 F1-score. Moreover, HiTyper filters out 50.6% of incorrect candidate types recommended by the SOTA DL model, indicating that HiTyper could improve the correctness of predicted types. Case studies also demonstrate the capability of HiTyper in alleviating the fixed type set issue, and in handling type drift and complicated types such as composite data types.

[56]  arXiv:2105.03596 [pdf, other]
Title: Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms
Comments: Accepted at CVPR ECV Workshop 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Mobile and embedded platforms are increasingly required to efficiently execute computationally demanding DNNs across heterogeneous processing elements. At runtime, the available hardware resources to DNNs can vary considerably due to other concurrently running applications. The performance requirements of the applications could also change under different scenarios. To achieve the desired performance, dynamic DNNs have been proposed in which the number of channels/layers can be scaled in real time to meet different requirements under varying resource constraints. However, the training process of such dynamic DNNs can be costly, since platform-aware models of different deployment scenarios must be retrained to become dynamic. This paper proposes Dynamic-OFA, a novel dynamic DNN approach for state-of-the-art platform-aware NAS models (i.e. Once-for-all network (OFA)). Dynamic-OFA pre-samples a family of sub-networks from a static OFA backbone model, and contains a runtime manager to choose different sub-networks under different runtime environments. As such, Dynamic-OFA does not need the traditional dynamic DNN training pipeline. Compared to the state-of-the-art, our experimental results using ImageNet on a Jetson Xavier NX show that the approach is up to 3.5x (CPU), 2.4x (GPU) faster for similar ImageNet Top-1 accuracy, or 3.8% (CPU), 5.1% (GPU) higher accuracy at similar latency.

[57]  arXiv:2105.03598 [pdf, ps, other]
Title: Pure Exploration Bandit Problem with General Reward Functions Depending on Full Distributions
Authors: Siwei Wang, Wei Chen
Subjects: Machine Learning (cs.LG)

In this paper, we study the pure exploration bandit model on general distribution functions, which means that the reward function of each arm depends on the whole distribution, not only its mean. We adapt the racing framework and LUCB framework to solve this problem, and design algorithms for estimating the value of the reward functions with different types of distributions. Then we show that our estimation methods have correctness guarantee with proper parameters, and obtain sample complexity upper bounds for them. Finally, we discuss about some important applications and their corresponding solutions under our learning framework.

[58]  arXiv:2105.03599 [pdf, other]
Title: Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval
Comments: 11 pages, 2 figures, Accepted by ACL 2021
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Recently, the retrieval models based on dense representations have been gradually applied in the first stage of the document retrieval tasks, showing better performance than traditional sparse vector space models. To obtain high efficiency, the basic structure of these models is Bi-encoder in most cases. However, this simple structure may cause serious information loss during the encoding of documents since the queries are agnostic. To address this problem, we design a method to mimic the queries on each of the documents by an iterative clustering process and represent the documents by multiple pseudo queries (i.e., the cluster centroids). To boost the retrieval process using approximate nearest neighbor search library, we also optimize the matching function with a two-step score calculation procedure. Experimental results on several popular ranking and QA datasets show that our model can achieve state-of-the-art results.

[59]  arXiv:2105.03600 [pdf, other]
Title: Incremental Training and Group Convolution Pruning for Runtime DNN Performance Scaling on Heterogeneous Embedded Platforms
Comments: Accepted at ACM/IEEE Workshop on Machine Learning for CAD (MLCAD) 2019
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Inference for Deep Neural Networks is increasingly being executed locally on mobile and embedded platforms due to its advantages in latency, privacy and connectivity. Since modern System on Chips typically execute a combination of different and dynamic workloads concurrently, it is challenging to consistently meet inference time/energy budget at runtime because of the local computing resources available to the DNNs vary considerably. To address this challenge, a variety of dynamic DNNs were proposed. However, these works have significant memory overhead, limited runtime recoverable compression rate and narrow dynamic ranges of performance scaling. In this paper, we present a dynamic DNN using incremental training and group convolution pruning. The channels of the DNN convolution layer are divided into groups, which are then trained incrementally. At runtime, following groups can be pruned for inference time/energy reduction or added back for accuracy recovery without model retraining. In addition, we combine task mapping and Dynamic Voltage Frequency Scaling (DVFS) with our dynamic DNN to deliver finer trade-off between accuracy and time/power/energy over a wider dynamic range. We illustrate the approach by modifying AlexNet for the CIFAR10 image dataset and evaluate our work on two heterogeneous hardware platforms: Odroid XU3 (ARM big.LITTLE CPUs) and Nvidia Jetson Nano (CPU and GPU). Compared to the existing works, our approach can provide up to 2.36x (energy) and 2.73x (time) wider dynamic range with a 2.4x smaller memory footprint at the same compression rate. It achieved 10.6x (energy) and 41.6x (time) wider dynamic range by combining with task mapping and DVFS.

[60]  arXiv:2105.03603 [pdf, ps, other]
Title: Learning to Detect an Odd Restless Markov Arm with a Trembling Hand
Comments: A shorter version of this manuscript has been accepted for presentation at the 2021 IEEE International Symposium on Information Theory. This manuscript contains the proofs of all the main results
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper studies the problem of finding an anomalous arm in a multi-armed bandit when (a) each arm is a finite-state Markov process, and (b) the arms are restless. Here, anomaly means that the transition probability matrix (TPM) of one of the arms (the odd arm) is different from the common TPM of each of the non-odd arms. The TPMs are unknown to a decision entity that wishes to find the index of the odd arm as quickly as possible, subject to an upper bound on the error probability. We derive a problem instance specific asymptotic lower bound on the expected time required to find the odd arm index, where the asymptotics is as the error probability vanishes. Further, we devise a policy based on the principle of certainty equivalence, and demonstrate that under a continuous selection assumption and a certain regularity assumption on the TPMs, the policy achieves the lower bound arbitrarily closely. Thus, while the lower bound is shown for all problem instances, the upper bound is shown only for those problem instances satisfying the regularity assumption. Our achievability analysis is based on resolving the identifiability problem in the context of a certain countable-state controlled Markov process.

[61]  arXiv:2105.03606 [pdf, ps, other]
Title: On Multi-Channel Huffman Codes for Asymmetric-Alphabet Channels
Comments: full version of the ISIT 2021 paper
Subjects: Information Theory (cs.IT)

Zero-error single-channel source coding has been studied extensively over the past decades. Its natural multi-channel generalization is however not well investigated. While the special case with multiple symmetric-alphabet channels was studied a decade ago, codes in such setting have no advantage over single-channel codes in data compression, making them worthless in most applications. With essentially no development since the last decade, in this paper, we break the stalemate by showing that it is possible to beat single-channel source codes in terms of compression assuming asymmetric-alphabet channels. We present the multi-channel analog of several classical results in single-channel source coding, such as that a multi-channel Huffman code is an optimal tree-decodable code. We also show some evidences that finding an efficient construction of multi-channel Huffman codes may be hard. Nevertheless, we propose a suboptimal code construction whose redundancy is guaranteed to be no larger than that of an optimal single-channel source code.

[62]  arXiv:2105.03608 [pdf, other]
Title: Optimising Resource Management for Embedded Machine Learning
Comments: Accepted at DATE 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Machine learning inference is increasingly being executed locally on mobile and embedded platforms, due to the clear advantages in latency, privacy and connectivity. In this paper, we present approaches for online resource management in heterogeneous multi-core systems and show how they can be applied to optimise the performance of machine learning workloads. Performance can be defined using platform-dependent (e.g. speed, energy) and platform-independent (accuracy, confidence) metrics. In particular, we show how a Deep Neural Network (DNN) can be dynamically scalable to trade-off these various performance metrics. Achieving consistent performance when executing on different platforms is necessary yet challenging, due to the different resources provided and their capability, and their time-varying availability when executing alongside other workloads. Managing the interface between available hardware resources (often numerous and heterogeneous in nature), software requirements, and user experience is increasingly complex.

[63]  arXiv:2105.03611 [pdf, other]
Title: 360NorVic: 360-Degree Video Classification from Mobile Encrypted Video Traffic
Comments: 7 pages, 15 figures, accepted in Workshop on Network and OperatingSystem Support for Digital Audio and Video (NOSSDAV 21)
Subjects: Multimedia (cs.MM)

Streaming 360{\deg} video demands high bandwidth and low latency, and poses significant challenges to Internet Service Providers (ISPs) and Mobile Network Operators (MNOs). The identification of 360{\deg} video traffic can therefore benefits fixed and mobile carriers to optimize their network and provide better Quality of Experience (QoE) to the user. However, end-to-end encryption of network traffic has obstructed identifying those 360{\deg} videos from regular videos. As a solution this paper presents 360NorVic, a near-realtime and offline Machine Learning (ML) classification engine to distinguish 360{\deg} videos from regular videos when streamed from mobile devices. We collect packet and flow level data for over 800 video traces from YouTube & Facebook accounting for 200 unique videos under varying streaming conditions. Our results show that for near-realtime and offline classification at packet level, average accuracy exceeds 95%, and that for flow level, 360NorVic achieves more than 92% average accuracy. Finally, we pilot our solution in the commercial network of a large MNO showing the feasibility and effectiveness of 360NorVic in production settings.

[64]  arXiv:2105.03616 [pdf, other]
Title: Interpretable Mixture Density Estimation by use of Differentiable Tree-module
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In order to develop reliable services using machine learning, it is important to understand the uncertainty of the model outputs. Often the probability distribution that the prediction target follows has a complex shape, and a mixture distribution is assumed as a distribution that uncertainty follows. Since the output of mixture density estimation is complicated, its interpretability becomes important when considering its use in real services. In this paper, we propose a method for mixture density estimation that utilizes an interpretable tree structure. Further, a fast inference procedure based on time-invariant information cache achieves both high speed and interpretability.

[65]  arXiv:2105.03619 [pdf, ps, other]
Title: Quantum Synchronizable Codes on Sextic Cyclotomy
Comments: Quantum Synchronizable, Sextic cyclotomy, Cyclic code
Subjects: Cryptography and Security (cs.CR)

Quantum synchronizable codes are kinds of quantum error-correcting codes that can not only correct the effects of quantum noise on qubits but also the misalignment in block synchronization. In this paper, the quantum synchronizable codes constructed are CSS quantum error-correcting codes whose synchronization capabilities reach the upper bound. And we use cyclic codes gained by sextic cyclotomic classes to construct two classes of quantum synchronizable codes. Moreover, the quantum synchronizable codes are posses good error-correcting capability towards bit error and phase error, since the cyclic codes we used are optimal or almost optimal.

[66]  arXiv:2105.03620 [pdf, other]
Title: ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting
Comments: 16 pages. Code is at: this https URL arXiv admin note: text overlap with arXiv:2002.10200
Subjects: Computer Vision and Pattern Recognition (cs.CV)

End-to-end text-spotting, which aims to integrate detection and recognition in a unified framework, has attracted increasing attention due to its simplicity of the two complimentary tasks. It remains an open problem especially when processing arbitrarily-shaped text instances. Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output. Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2). Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance of arbitrary shapes, significantly improving the precision of recognition over previous methods. 3) Different from previous methods, which often suffer from complex post-processing and sensitive hyper-parameters, our ABCNet v2 maintains a simple pipeline with the only post-processing non-maximum suppression (NMS). 4) As the performance of text recognition closely depends on feature alignment, ABCNet v2 further adopts a simple yet effective coordinate convolution to encode the position of the convolutional filters, which leads to a considerable improvement with negligible computation overhead. Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the-art performance while maintaining very high efficiency.

[67]  arXiv:2105.03626 [pdf, ps, other]
Title: SuMo: A Mutation Testing Strategy for Solidity Smart Contracts
Subjects: Software Engineering (cs.SE)

Smart Contracts are software programs that are deployed and executed within a blockchain infrastructure. Due to their immutable nature, directly resulting from the specific characteristics of the deploying infrastructure, smart contracts must be thoroughly tested before their release. Testing is one of the main activities that can help to improve the reliability of a smart contract, so as to possibly prevent considerable loss of valuable assets. It is therefore important to provide the testers with tools that permit them to assess the activity they performed. Mutation testing is a powerful approach for assessing the fault-detection capability of a test suite. In this paper, we propose SuMo, a novel mutation testing tool for Ethereum Smart Contracts. SuMo implements a set of 44 mutation operators that were designed starting from the latest Solidity documentation, and from well-known mutation testing tools. These allow to simulate a wide variety of faults that can be made by smart contract developers. The set of operators was designed to limit the generation of stillborn mutants, which slow down the mutation testing process and limit the usability of the tool. We report a first evaluation of SuMo on open-source projects for which test suites were available. The results we got are encouraging, and they suggest that SuMo can effectively help developers to deliver more reliable smart contracts.

[68]  arXiv:2105.03627 [pdf, other]
Title: Improving Cross-Lingual Reading Comprehension with Self-Training
Comments: 8 pages, 4 figures
Subjects: Computation and Language (cs.CL)

Substantial improvements have been made in machine reading comprehension, where the machine answers questions based on a given context. Current state-of-the-art models even surpass human performance on several benchmarks. However, their abilities in the cross-lingual scenario are still to be explored. Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension. In this paper, we further utilized unlabeled data to improve the performance. The model is first supervised-trained on source language corpus, and then self-trained with unlabeled target language data. The experiment results showed improvements for all languages, and we also analyzed how self-training benefits cross-lingual reading comprehension in qualitative aspects.

[69]  arXiv:2105.03630 [pdf, other]
Title: A Phase Theory of MIMO LTI Systems
Subjects: Systems and Control (eess.SY)

In this paper, we introduce a definition of phase response for a class of multi-input multi-output (MIMO) linear time-invariant (LTI) systems whose frequency responses are (semi-)sectorial at all frequencies. The newly defined phase concept subsumes the well-known notions of positive real systems and negative imaginary systems. We formulate a small phase theorem for feedback stability, which complements the celebrated small gain theorem. The small phase theorem lays the foundation of a phase theory of MIMO systems. We also discuss time-domain interpretations of phase-bounded systems via both energy signal analysis and power signal analysis. In addition, a sectored real lemma is derived for the computation of MIMO phases, which serves as a natural counterpart of the bounded real lemma.

[70]  arXiv:2105.03631 [pdf, other]
Title: Coded Alternating Least Squares for Straggler Mitigation in Distributed Recommendations
Comments: 11 pages
Subjects: Information Theory (cs.IT)

Matrix factorization is an important representation learning algorithm, e.g., recommender systems, where a large matrix can be factorized into the product of two low dimensional matrices termed as latent representations. This paper investigates the problem of matrix factorization in distributed computing systems with stragglers, those compute nodes that are slow to return computation results. A computation procedure, called coded Alternative Least Square (ALS), is proposed for mitigating the effect of stragglers in such systems. The coded ALS algorithm iteratively computes two low dimensional latent matrices by solving various linear equations, with the Entangled Polynomial Code (EPC) as a building block. We theoretically characterize the maximum number of stragglers that the algorithm can tolerate (or the recovery threshold) in relation to the redundancy of coding (or the code rate). In addition, we theoretically show the computation complexity for the coded ALS algorithm and conduct numerical experiments to validate our design.

[71]  arXiv:2105.03632 [pdf, other]
Title: CASIA-Face-Africa: A Large-scale African Face Image Database
Comments: This paper has been accepted for publication in the journal IEEE TIFS
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face recognition is a popular and well-studied area with wide applications in our society. However, racial bias had been proven to be inherent in most State Of The Art (SOTA) face recognition systems. Many investigative studies on face recognition algorithms have reported higher false positive rates of African subjects cohorts than the other cohorts. Lack of large-scale African face image databases in public domain is one of the main restrictions in studying the racial bias problem of face recognition. To this end, we collect a face image database namely CASIA-Face-Africa which contains 38,546 images of 1,183 African subjects. Multi-spectral cameras are utilized to capture the face images under various illumination settings. Demographic attributes and facial expressions of the subjects are also carefully recorded. For landmark detection, each face image in the database is manually labeled with 68 facial keypoints. A group of evaluation protocols are constructed according to different applications, tasks, partitions and scenarios. The performances of SOTA face recognition algorithms without re-training are reported as baselines. The proposed database along with its face landmark annotations, evaluation protocols and preliminary results form a good benchmark to study the essential aspects of face biometrics for African subjects, especially face image preprocessing, face feature analysis and matching, facial expression recognition, sex/age estimation, ethnic classification, face image generation, etc. The database can be downloaded from our this http URL

[72]  arXiv:2105.03636 [pdf, other]
Title: RISe of Flight: RIS-Empowered UAV Communications for Robust and Reliable Air-to-Ground Networks
Comments: Submitted for journal publication
Subjects: Information Theory (cs.IT)

Next generation mobile networks need to expand towards uncharted territories in order to enable the digital transformation of society. In this context, aerial devices such as unmanned aerial vehicles (UAVs) are expected to address this gap in hard-to-reach locations. However, limited battery-life is an obstacle for the successful spread of such solutions. Reconfigurable intelligent surfaces (RISs) represent a promising solution addressing this challenge since on-board passive and lightweight controllable devices can efficiently reflect the signal propagation from the ground BSs towards specific target areas. In this paper, we focus on air-to-ground networks where UAVs equipped with RIS can fly over selected areas to provide connectivity. In particular, we study how to optimally compensate flight effects and propose RiFe as well as its practical implementation Fair-RiFe that automatically configure RIS parameters accounting for undesired UAV oscillations due to adverse atmospheric conditions. Our results show that both algorithms provide robustness and reliability while outperforming state-of-the-art solutions in the multiple conditions studied.

[73]  arXiv:2105.03638 [pdf, ps, other]
Title: Fast Neighborhood Rendezvous
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In the rendezvous problem, two computing entities (called \emph{agents}) located at different vertices in a graph have to meet at the same vertex. In this paper, we consider the synchronous \emph{neighborhood rendezvous problem}, where the agents are initially located at two adjacent vertices. While this problem can be trivially solved in $O(\Delta)$ rounds ($\Delta$ is the maximum degree of the graph), it is highly challenging to reveal whether that problem can be solved in $o(\Delta)$ rounds, even assuming the rich computational capability of agents. The only known result is that the time complexity of $O(\sqrt{n})$ rounds is achievable if the graph is complete and agents are probabilistic, asymmetric, and can use whiteboards placed at vertices. Our main contribution is to clarify the situation (with respect to computational models and graph classes) admitting such a sublinear-time rendezvous algorithm. More precisely, we present two algorithms achieving fast rendezvous additionally assuming bounded minimum degree, unique vertex identifier, accessibility to neighborhood IDs, and randomization. The first algorithm runs within $\tilde{O}(\sqrt{n\Delta/\delta} + n/\delta)$ rounds for graphs of the minimum degree larger than $\sqrt{n}$, where $n$ is the number of vertices in the graph, and $\delta$ is the minimum degree of the graph. The second algorithm assumes that the largest vertex ID is $O(n)$, and achieves $\tilde{O}\left( \frac{n}{\sqrt{\delta}} \right)$-round time complexity without using whiteboards. These algorithms attain $o(\Delta)$-round complexity in the case of $\delta = {\omega}(\sqrt{n} \log n)$ and $\delta = \omega(n^{2/3} \log^{4/3} n)$ respectively.

[74]  arXiv:2105.03640 [pdf, other]
Title: On Guaranteed Optimal Robust Explanations for NLP Models
Comments: 12 pages (7+5 Appendix). Accepted as long-paper at IJCAI 2021
Journal-ref: IJCAI 2021
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We build on abduction-based explanations for ma-chine learning and develop a method for computing local explanations for neural network models in natural language processing (NLP). Our explanations comprise a subset of the words of the in-put text that satisfies two key features: optimality w.r.t. a user-defined cost function, such as the length of explanation, and robustness, in that they ensure prediction invariance for any bounded perturbation in the embedding space of the left out words. We present two solution algorithms, respectively based on implicit hitting sets and maximum universal subsets, introducing a number of algorithmic improvements to speed up convergence of hard instances. We show how our method can be con-figured with different perturbation sets in the em-bedded space and used to detect bias in predictions by enforcing include/exclude constraints on biased terms, as well as to enhance existing heuristic-based NLP explanation frameworks such as Anchors. We evaluate our framework on three widely used sentiment analysis tasks and texts of up to100words from SST, Twitter and IMDB datasets,demonstrating the effectiveness of the derived explanations.

[75]  arXiv:2105.03641 [pdf, other]
Title: Neural Text Generation with Part-of-Speech Guided Softmax
Comments: Main text: 8 pages, 2 figures, 8 tables. Supplementary Information: 2 pages, 7 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Neural text generation models are likely to suffer from the low-diversity problem. Various decoding strategies and training-based methods have been proposed to promote diversity only by exploiting contextual features, but rarely do they consider incorporating syntactic structure clues. In this work, we propose using linguistic annotation, i.e., part-of-speech (POS), to guide the text generation. In detail, we introduce POS Guided Softmax (POSG-Softmax) to explicitly model two posterior probabilities: (i) next-POS, and (ii) next-token from the vocabulary of the target POS. A POS guided sampling strategy is further proposed to address the low-diversity problem by enriching the diversity of POS. Extensive experiments and human evaluations demonstrate that, compared with existing state-of-the-art methods, our proposed methods can generate more diverse text while maintaining comparable quality.

[76]  arXiv:2105.03642 [pdf, other]
Title: MIMO Terahertz Quantum Key Distribution
Comments: Submitted to IEEE Communications Letters
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Signal Processing (eess.SP); Quantum Physics (quant-ph)

We propose a multiple-input multiple-output (MIMO) quantum key distribution (QKD) scheme for improving the secret key rates and increasing the maximum transmission distance for terahertz (THz) frequency range applications operating at room temperature. We propose a transmit beamforming and receive combining scheme that converts the rank-$r$ MIMO channel between Alice and Bob into $r$ parallel lossy quantum channels whose transmittances depend on the non-zero singular values of the MIMO channel. The MIMO transmission scheme provides a multiplexing gain of $r$, along with a beamforming and array gain equal to the product of the number of transmit and receive antennas. This improves the secret key rate and extends the maximum transmission distance. Our simulation results show that multiple antennas are necessary to overcome the high free-space path loss at THz frequencies. Positive key rates are achievable in the $10-30$ THz frequency range that can be used for both indoor and outdoor QKD applications for beyond fifth generation ultra-secure wireless communications systems.

[77]  arXiv:2105.03647 [pdf, other]
Title: A Novel Triplet Sampling Method for Multi-Label Remote Sensing Image Search and Retrieval
Comments: The paper is under review. Our code is available online at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Learning the similarity between remote sensing (RS) images forms the foundation for content based RS image retrieval (CBIR). Recently, deep metric learning approaches that map the semantic similarity of images into an embedding space have been found very popular in RS. A common approach for learning the metric space relies on the selection of triplets of similar (positive) and dissimilar (negative) images to a reference image called as an anchor. Choosing triplets is a difficult task particularly for multi-label RS CBIR, where each training image is annotated by multiple class labels. To address this problem, in this paper we propose a novel triplet sampling method in the framework of deep neural networks (DNNs) defined for multi-label RS CBIR problems. The proposed method selects a small set of the most representative and informative triplets based on two main steps. In the first step, a set of anchors that are diverse to each other in the embedding space is selected from the current mini-batch using an iterative algorithm. In the second step, different sets of positive and negative images are chosen for each anchor by evaluating relevancy, hardness, and diversity of the images among each other based on a novel ranking strategy. Experimental results obtained on two multi-label benchmark achieves show that the selection of the most informative and representative triplets in the context of DNNs results in: i) reducing the computational complexity of the training phase of the DNNs without any significant loss on the performance; and ii) an increase in learning speed since informative triplets allow fast convergence. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/image-retrieval-from-triplets.

[78]  arXiv:2105.03649 [pdf, other]
Title: In-Hardware Learning of Multilayer Spiking Neural Networks on a Neuromorphic Processor
Comments: 6 pages, 5 figures, accepted for Design Automation Conference (DAC) 2021
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)

Although widely used in machine learning, backpropagation cannot directly be applied to SNN training and is not feasible on a neuromorphic processor that emulates biological neuron and synapses. This work presents a spike-based backpropagation algorithm with biological plausible local update rules and adapts it to fit the constraint in a neuromorphic hardware. The algorithm is implemented on Intel Loihi chip enabling low power in-hardware supervised online learning of multilayered SNNs for mobile applications. We test this implementation on MNIST, Fashion-MNIST, CIFAR-10 and MSTAR datasets with promising performance and energy-efficiency, and demonstrate a possibility of incremental online learning with the implementation.

[79]  arXiv:2105.03650 [pdf, other]
Title: How To Train Your Program
Authors: David Tolpin
Comments: submitted to PROBPROG11
Subjects: Machine Learning (cs.LG)

We present a Bayesian approach to machine learning with probabilistic programs. In our approach, training on available data is implemented as inference on a hierarchical model. The posterior distribution of model parameters is then used to \textit{stochastically condition} a complementary model, such that inference on new data yields the same posterior distribution of latent parameters corresponding to the new data as inference on a hierachical model on the combination of both previously available and new data, at a lower computation cost. We frame the approach as a design pattern of probabilistic programming referred to herein as `stump and fungus', and illustrate realization of the pattern on a didactic case study.

[80]  arXiv:2105.03654 [pdf, other]
Title: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning
Comments: Accepted to ACL 2021, submission version, 12 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent advances in Named Entity Recognition (NER) show that document-level contexts can significantly improve model performance. In many application scenarios, however, such contexts are not available. In this paper, we propose to find external contexts of a sentence by retrieving and selecting a set of semantically relevant texts through a search engine, with the original sentence as the query. We find empirically that the contextual representations computed on the retrieval-based input view, constructed through the concatenation of a sentence and its external contexts, can achieve significantly improved performance compared to the original input view based only on the sentence. Furthermore, we can improve the model performance of both input views by Cooperative Learning, a training method that encourages the two input views to produce similar contextual representations or output label distributions. Experiments show that our approach can achieve new state-of-the-art performance on 8 NER data sets across 5 domains.

[81]  arXiv:2105.03655 [pdf, other]
Title: FlingBot: The Unreasonable Effectiveness of Dynamic Manipulation for Cloth Unfolding
Authors: Huy Ha, Shuran Song
Comments: 9 pages, 6 figures
Subjects: Robotics (cs.RO)

High-velocity dynamic actions (e.g., fling or throw) play a crucial role in our every-day interaction with deformable objects by improving our efficiency and effectively expanding our physical reach range. Yet, most prior works have tackled cloth manipulation using exclusively single-arm quasi-static actions, which requires a large number of interactions for challenging initial cloth configurations and strictly limits the maximum cloth size by the robot's reach range. In this work, we demonstrate the effectiveness of dynamic flinging actions for cloth unfolding. We propose a self-supervised learning framework, FlingBot, that learns how to unfold a piece of fabric from arbitrary initial configurations using a pick, stretch, and fling primitive for a dual-arm setup from visual observations. The final system achieves over 80\% coverage within 3 actions on novel cloths, can unfold cloths larger than the system's reach range, and generalizes to T-shirts despite being trained on only rectangular cloths. We also finetuned FlingBot on a real-world dual-arm robot platform, where it increased the cloth coverage 3.6 times more than the quasi-static baseline did. The simplicity of FlingBot combined with its superior performance over quasi-static baselines demonstrates the effectiveness of dynamic actions for deformable object manipulation. The project video is available at $\href{https://youtu.be/T4tDy5y_6ZM}{here}$.

[82]  arXiv:2105.03659 [pdf, other]
Title: Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text
Comments: 10 pages, 4 figures
Subjects: Computation and Language (cs.CL)

Logical reasoning of text requires understanding critical logical information in the text and performing inference over them. Large-scale pre-trained models for logical reasoning mainly focus on word-level semantics of text while struggling to capture symbolic logic. In this paper, we propose to understand logical symbols and expressions in the text to arrive at the answer. Based on such logical information, we not only put forward a context extension framework but also propose a data augmentation algorithm. The former extends the context to cover implicit logical expressions following logical equivalence laws. The latter augments literally similar but logically different instances to better capture logical information, especially logical negative and conditional relationships. We conduct experiments on ReClor dataset. The results show that our method achieves the state-of-the-art performance, and both logic-driven context extension framework and data augmentation algorithm can help improve the accuracy. And our multi-model ensemble system is the first to surpass human performance on both EASY set and HARD set of ReClor.

[83]  arXiv:2105.03663 [pdf, other]
Title: On Linear Interpolation in the Latent Space of Deep Generative Models
Comments: For BibTex and Poster: this https URL
Subjects: Machine Learning (cs.LG)

The underlying geometrical structure of the latent space in deep generative models is in most cases not Euclidean, which may lead to biases when comparing interpolation capabilities of two models. Smoothness and plausibility of linear interpolations in latent space are associated with the quality of the underlying generative model. In this paper, we show that not all such interpolations are comparable as they can deviate arbitrarily from the shortest interpolation curve given by the geodesic. This deviation is revealed by computing curve lengths with the pull-back metric of the generative model, finding shorter curves than the straight line between endpoints, and measuring a non-zero relative length improvement on this straight line. This leads to a strategy to compare linear interpolations across two generative models. We also show the effect and importance of choosing an appropriate output space for computing shorter curves. For this computation we derive an extension of the pull-back metric.

[84]  arXiv:2105.03664 [pdf, other]
Title: D2S: Document-to-Slide Generation Via Query-Based Text Summarization
Comments: accepted at NAACL 2021
Subjects: Computation and Language (cs.CL)

Presentations are critical for communication in all areas of our lives, yet the creation of slide decks is often tedious and time-consuming. There has been limited research aiming to automate the document-to-slides generation process and all face a critical challenge: no publicly available dataset for training and benchmarking. In this work, we first contribute a new dataset, SciDuet, consisting of pairs of papers and their corresponding slides decks from recent years' NLP and ML conferences (e.g., ACL). Secondly, we present D2S, a novel system that tackles the document-to-slides task with a two-step approach: 1) Use slide titles to retrieve relevant and engaging text, figures, and tables; 2) Summarize the retrieved context into bullet points with long-form question answering. Our evaluation suggests that long-form QA outperforms state-of-the-art summarization baselines on both automated ROUGE metrics and qualitative human evaluation.

[85]  arXiv:2105.03669 [pdf, other]
Title: Chameleon: A Semi-AutoML framework targeting quick and scalable development and deployment of production-ready ML systems for SMEs
Subjects: Software Engineering (cs.SE); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Developing, scaling, and deploying modern Machine Learning solutions remains challenging for small- and middle-sized enterprises (SMEs). This is due to a high entry barrier of building and maintaining a dedicated IT team as well as the difficulties of real-world data (RWD) compared to standard benchmark data. To address this challenge, we discuss the implementation and concepts of Chameleon, a semi-AutoML framework. The goal of Chameleon is fast and scalable development and deployment of production-ready machine learning systems into the workflow of SMEs. We first discuss the RWD challenges faced by SMEs. After, we outline the central part of the framework which is a model and loss-function zoo with RWD-relevant defaults. Subsequently, we present how one can use a templatable framework in order to automate the experiment iteration cycle, as well as close the gap between development and deployment. Finally, we touch on our testing framework component allowing us to investigate common model failure modes and support best practices of model deployment governance.

[86]  arXiv:2105.03671 [pdf, other]
Title: The Tags Are Alright: Robust Large-Scale RFID Clone Detection Through Federated Data-Augmented Radio Fingerprinting
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

Millions of RFID tags are pervasively used all around the globe to inexpensively identify a wide variety of everyday-use objects. One of the key issues of RFID is that tags cannot use energy-hungry cryptography. For this reason, radio fingerprinting (RFP) is a compelling approach that leverages the unique imperfections in the tag's wireless circuitry to achieve large-scale RFID clone detection. Recent work, however, has unveiled that time-varying channel conditions can significantly decrease the accuracy of the RFP process. We propose the first large-scale investigation into RFP of RFID tags with dynamic channel conditions. Specifically, we perform a massive data collection campaign on a testbed composed by 200 off-the-shelf identical RFID tags and a software-defined radio (SDR) tag reader. We collect data with different tag-reader distances in an over-the-air configuration. To emulate implanted RFID tags, we also collect data with two different kinds of porcine meat inserted between the tag and the reader. We use this rich dataset to train and test several convolutional neural network (CNN)--based classifiers in a variety of channel conditions. Our investigation reveals that training and testing on different channel conditions drastically degrades the classifier's accuracy. For this reason, we propose a novel training framework based on federated machine learning (FML) and data augmentation (DAG) to boost the accuracy. Extensive experimental results indicate that (i) our FML approach improves accuracy by up to 48%; (ii) our DA approach improves the FML performance by up to 31%. To the best of our knowledge, this is the first paper experimentally demonstrating the efficacy of FML and DA on a large device population. We are sharing with the research community our fully-labeled 200-GB RFID waveform dataset, the entirety of our code and trained models.

[87]  arXiv:2105.03677 [pdf, other]
Title: Active Terahertz Imaging Dataset for Concealed Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Concealed object detection in Terahertz imaging is an urgent need for public security and counter-terrorism. In this paper, we provide a public dataset for evaluating multi-object detection algorithms in active Terahertz imaging resolution 5 mm by 5 mm. To the best of our knowledge, this is the first public Terahertz imaging dataset prepared to evaluate object detection algorithms. Object detection on this dataset is much more difficult than on those standard public object detection datasets due to its inferior imaging quality. Facing the problem of imbalanced samples in object detection and hard training samples, we evaluate four popular detectors: YOLOv3, YOLOv4, FRCN-OHEM, and RetinaNet on this dataset. Experimental results indicate that the RetinaNet achieves the highest mAP. In addition, we demonstrate that hiding objects in different parts of the human body affect detection accuracy. The dataset is available at https://github.com/LingLIx/THz_Dataset.

[88]  arXiv:2105.03680 [pdf, other]
Title: A Crossover That Matches Diverse Parents Together in Evolutionary Algorithms
Comments: Accepted to GECCO 2021
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Crossover and mutation are the two main operators that lead to new solutions in evolutionary approaches. In this article, a new method of performing the crossover phase is presented. The problem of choice is evolutionary decision tree construction. The method aims at finding such individuals that together complement each other. Hence we say that they are diversely specialized. We propose the way of calculating the so-called complementary fitness. In several empirical experiments, we evaluate the efficacy of the method proposed in four variants and compare it to a fitness-rank-based approach. One variant emerges clearly as the best approach, whereas the remaining ones are below the baseline.

[89]  arXiv:2105.03681 [pdf, ps, other]
Title: A Simple yet Universal Strategy for Online Convex Optimization
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Recently, several universal methods have been proposed for online convex optimization, and attain minimax rates for multiple types of convex functions simultaneously. However, they need to design and optimize one surrogate loss for each type of functions, which makes it difficult to exploit the structure of the problem and utilize the vast amount of existing algorithms. In this paper, we propose a simple strategy for universal online convex optimization, which avoids these limitations. The key idea is to construct a set of experts to process the original online functions, and deploy a meta-algorithm over the \emph{linearized} losses to aggregate predictions from experts. Specifically, we choose Adapt-ML-Prod to track the best expert, because it has a second-order bound and can be used to leverage strong convexity and exponential concavity. In this way, we can plug in off-the-shelf online solvers as black-box experts to deliver problem-dependent regret bounds. Furthermore, our strategy inherits the theoretical guarantee of any expert designed for strongly convex functions and exponentially concave functions, up to a double logarithmic factor. For general convex functions, it maintains the minimax optimality and also achieves a small-loss bound.

[90]  arXiv:2105.03682 [pdf, other]
Title: Enhancing ensemble learning and transfer learning in multimodal data analysis by adaptive dimensionality reduction
Comments: 18 pages, 10 figures, submitted to Pattern Recognition
Subjects: Machine Learning (cs.LG)

Modern data analytics take advantage of ensemble learning and transfer learning approaches to tackle some of the most relevant issues in data analysis, such as lack of labeled data to use to train the analysis models, sparsity of the information, and unbalanced distributions of the records. Nonetheless, when applied to multimodal datasets (i.e., datasets acquired by means of multiple sensing techniques or strategies), the state-of-theart methods for ensemble learning and transfer learning might show some limitations. In fact, in multimodal data analysis, not all observations would show the same level of reliability or information quality, nor an homogeneous distribution of errors and uncertainties. This condition might undermine the classic assumptions ensemble learning and transfer learning methods rely on. In this work, we propose an adaptive approach for dimensionality reduction to overcome this issue. By means of a graph theory-based approach, the most relevant features across variable size subsets of the considered datasets are identified. This information is then used to set-up ensemble learning and transfer learning architectures. We test our approach on multimodal datasets acquired in diverse research fields (remote sensing, brain-computer interfaces, photovoltaic energy). Experimental results show the validity and the robustness of our approach, able to outperform state-of-the-art techniques.

[91]  arXiv:2105.03686 [pdf, other]
Title: Long Short-Term Temporal Meta-learning in Online Recommendation
Comments: 8 pages
Subjects: Information Retrieval (cs.IR)

An effective online recommendation system should jointly capture user long-term and short-term preferences in both user internal and external behaviors. However, it is challenging to conduct fast adaptations to variable new topics while making full use of all information in large-scale systems, due to the online efficiency limitation and complexity of real-world systems. To address this, we propose a novel Long Short-Term Temporal Meta-learning framework (LSTTM) for online recommendation, which captures user preferences from a global long-term graph and an internal short-term graph. To improve online learning for short-term interests, we propose a temporal MAML method with asynchronous online updating for fast adaptation, which regards recommendations at different time periods as different tasks. In experiments, LSTTM achieves significant improvements on both offline and online evaluations. LSTTM has also been deployed on a widely-used online system, affecting millions of users. The idea of temporal MAML can be easily transferred to other models and temporal tasks.

[92]  arXiv:2105.03687 [pdf, other]
Title: Covariance Matrix Adaptation Evolution Strategy Assisted by Principal Component Analysis
Authors: Yangjie Mei
Comments: 13 pages, 4 figures
Subjects: Neural and Evolutionary Computing (cs.NE)

Over the past decades, more and more methods gain a giant development due to the development of technology. Evolutionary Algorithms are widely used as a heuristic method. However, the budget of computation increases exponentially when the dimensions increase. In this paper, we will use the dimensionality reduction method Principal component analysis (PCA) to reduce the dimension during the iteration of Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which is a good Evolutionary Algorithm that is presented as the numeric type and useful for different kinds of problems. We assess the performance of our new methods in terms of convergence rate on multi-modal problems from the Black-Box Optimization Benchmarking (BBOB) problem set and we also use the framework COmparing Continuous Optimizers (COCO) to see how the new method going and compare it to the other algorithms.

[93]  arXiv:2105.03688 [pdf, other]
Title: HamNet: Conformation-Guided Molecular Representation with Hamiltonian Neural Networks
Comments: in ICLR-2021 (poster)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Chemical Physics (physics.chem-ph)

Well-designed molecular representations (fingerprints) are vital to combine medical chemistry and deep learning. Whereas incorporating 3D geometry of molecules (i.e. conformations) in their representations seems beneficial, current 3D algorithms are still in infancy. In this paper, we propose a novel molecular representation algorithm which preserves 3D conformations of molecules with a Molecular Hamiltonian Network (HamNet). In HamNet, implicit positions and momentums of atoms in a molecule interact in the Hamiltonian Engine following the discretized Hamiltonian equations. These implicit coordinations are supervised with real conformations with translation- & rotation-invariant losses, and further used as inputs to the Fingerprint Generator, a message-passing neural network. Experiments show that the Hamiltonian Engine can well preserve molecular conformations, and that the fingerprints generated by HamNet achieve state-of-the-art performances on MoleculeNet, a standard molecular machine learning benchmark.

[94]  arXiv:2105.03689 [pdf, other]
Title: Self-Supervised Adversarial Example Detection by Disentangled Representation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Deep learning models are known to be vulnerable to adversarial examples that are elaborately designed for malicious purposes and are imperceptible to the human perceptual system. Autoencoder, when trained solely over benign examples, has been widely used for (self-supervised) adversarial detection based on the assumption that adversarial examples yield larger reconstruction error. However, because lacking adversarial examples in its training and the too strong generalization ability of autoencoder, this assumption does not always hold true in practice. To alleviate this problem, we explore to detect adversarial examples by disentangled representations of images under the autoencoder structure. By disentangling input images as class features and semantic features, we train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples. This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder. Compared with the state-of-the-art self-supervised detection methods, our method exhibits better performance in various measurements (i.e., AUC, FPR, TPR) over different datasets (MNIST, Fashion-MNIST and CIFAR-10), different adversarial attack methods (FGSM, BIM, PGD, DeepFool, and CW) and different victim models (8-layer CNN and 16-layer VGG). We compare our method with the state-of-the-art self-supervised detection methods under different adversarial attacks and different victim models (30 attack settings), and it exhibits better performance in various measurements (AUC, FPR, TPR) for most attacks settings. Ideally, AUC is $1$ and our method achieves $0.99+$ on CIFAR-10 for all attacks. Notably, different from other Autoencoder-based detectors, our method can provide resistance to the adaptive adversary.

[95]  arXiv:2105.03692 [pdf, other]
Title: Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

A recent line of work has shown that deep networks are highly susceptible to backdoor data poisoning attacks. Specifically, by injecting a small amount of malicious data into the training distribution, an adversary gains the ability to control the model's behavior during inference. In this work, we propose an iterative training procedure for removing poisoned data from the training set. Our approach consists of two steps. We first train an ensemble of weak learners to automatically discover distinct subpopulations in the training set. We then leverage a boosting framework to recover the clean data. Empirically, our method successfully defends against several state-of-the-art backdoor attacks, including both clean and dirty label attacks. We also present results from an independent third-party evaluation including a recent \textit{adaptive} poisoning adversary. The results indicate our approach is competitive with existing defenses against backdoor attacks on deep neural networks, and significantly outperforms the state-of-the-art in several scenarios.

[96]  arXiv:2105.03693 [pdf, other]
Title: On the discrepancy of set systems definable in sparse graph classes
Subjects: Discrete Mathematics (cs.DM); Logic in Computer Science (cs.LO); Combinatorics (math.CO); Logic (math.LO)

Discrepancy is a natural measure for the inherent complexity of set systems with many applications in mathematics and computer science. The discrepancy of a set system $(U,\mathscr S)$ is the minimum over all mappings $\chi\colon U\rightarrow\{-1,1\}$ of $\max_{S\in\mathscr S}\bigl|\sum_{v\in S}\chi(v)\bigr|$. We study the discrepancy of set systems that are first-order definable in sparse graph classes. We prove that all the set systems definable in a monotone class $\mathscr C$ have bounded discrepancy if and only if $\mathscr C$ has bounded expansion, and that they have hereditary discrepancy at most $|U|^{c}$ (for some~$c<1/2$) if and only if $\mathscr C$ is nowhere dense. However, if $\mathscr C$ is somewhere dense, then for every positive integer $d$ there is a set system of $d$-tuples definable in $\mathscr C$ with discrepancy $\Omega(|U|^{1/2})$.
From the algorithmic point of view, we prove that if $\mathscr C$ is a class of graphs with bounded expansion and $\phi(\bar x;\bar y)$ is a first-order formula, then for each input graph $G\in\mathscr C$, a mapping $\chi:V(G)^{|\bar x|}\rightarrow\{-1,1\}$ witnessing the boundedness of the discrepancy of the set-system defined by~$\phi$ can be computed in $\mathcal O(|G|^{|\bar x|})$ time. We also deduce that for such set-systems, when $|\bar x|=1$, $\varepsilon$-nets of size $\mathcal{O}(1/\varepsilon)$ can be computed in time $\mathcal{O}(|G|\,\log |G|)$ and $\varepsilon$-approximations of size $\mathcal{O}(1/\varepsilon)$ can be computed in polynomial time.

[97]  arXiv:2105.03695 [pdf, ps, other]
Title: LPVcore: MATLAB Toolbox for LPV Modelling, Identification and Control
Subjects: Systems and Control (eess.SY)

This paper describes the LPVcore software package for MATLAB developed to model, simulate, estimate and control systems via linear parameter-varying (LPV) input-output (IO), state-space (SS) and linear fractional (LFR) representations. In the LPVcore toolbox, basis affine parameter-varying matrix functions are implemented to enable users to represent LPV systems in a global setting, i.e., for time-varying scheduling trajectories. This is a key difference compared to other software suites that use a grid or only LFR-based representations. The paper contains an overview of functions in the toolbox to simulate and identify IO, SS and LFR representations. Based on various prediction-error minimization methods, a comprehensive example is given on the identification of a DC motor with an unbalanced disc, demonstrating the capabilities of the toolbox. The software and examples are available on www.lpvcore.net.

[98]  arXiv:2105.03701 [pdf, other]
Title: Business Entity Matching with Siamese Graph Convolutional Networks
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent developments in machine learning and in particular deep learning have opened the way to more general and efficient solutions to data-integration tasks. In this paper, we demonstrate an approach that allows modeling and integrating entities by leveraging their relations and contextual information. This is achieved by combining siamese and graph neural networks to effectively propagate information between connected entities and support high scalability. We evaluated our approach on the task of integrating data about business entities, demonstrating that it outperforms both traditional rule-based systems and other deep learning approaches.

[99]  arXiv:2105.03702 [pdf, ps, other]
Title: On a conjecture on APN permutations
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

The single trivariate representation proposed in [C. Beierle, C. Carlet, G. Leander, L. Perrin, A Further Study of Quadratic APN Permutations in Dimension Nine, arXiv:2104.08008] of the two sporadic quadratic APN permutations in dimension 9 found by Beierle and Leander \cite{Beierle} is further investigated. In particular, using tools from algebraic geometry over finite fields, we prove that such a family does not contain any other APN permutation for larger dimensions.

[100]  arXiv:2105.03703 [pdf, other]
Title: Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics
Comments: ICML 2021
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Probability (math.PR)

Yang (2020a) recently showed that the Neural Tangent Kernel (NTK) at initialization has an infinite-width limit for a large class of architectures including modern staples such as ResNet and Transformers. However, their analysis does not apply to training. Here, we show the same neural networks (in the so-called NTK parametrization) during training follow a kernel gradient descent dynamics in function space, where the kernel is the infinite-width NTK. This completes the proof of the *architectural universality* of NTK behavior. To achieve this result, we apply the Tensor Programs technique: Write the entire SGD dynamics inside a Tensor Program and analyze it via the Master Theorem. To facilitate this proof, we develop a graphical notation for Tensor Programs.

[101]  arXiv:2105.03705 [pdf, other]
Title: Understanding Neural Networks with Logarithm Determinant Entropy Estimator
Comments: 15pages,22 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Understanding the informative behaviour of deep neural networks is challenged by misused estimators and the complexity of network structure, which leads to inconsistent observations and diversified interpretation. Here we propose the LogDet estimator -- a reliable matrix-based entropy estimator that approximates Shannon differential entropy. We construct informative measurements based on LogDet estimator, verify our method with comparable experiments and utilize it to analyse neural network behaviour. Our results demonstrate the LogDet estimator overcomes the drawbacks that emerge from highly diverse and degenerated distribution thus is reliable to estimate entropy in neural networks. The Network analysis results also find a functional distinction between shallow and deeper layers, which can help understand the compression phenomenon in the Information bottleneck theory of neural networks.

[102]  arXiv:2105.03708 [pdf, ps, other]
Title: All Together Now: Teachers as Research Partners in the Design of Search Technology for the Classroom
Comments: In KidRec '21: 5th International and Interdisciplinary Perspectives on Children & Recommender and Information Retrieval Systems (KidRec) Search and Recommendation Technology through the Lens of a Teacher- Co-located with ACM IDC 2021; June 26, 2021; Online Event
Subjects: Information Retrieval (cs.IR); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

In the classroom environment, search tools are the means for students to access Web resources. The perspectives of students, researchers, and industry practitioners lead the ongoing research debate in this area. In this article, we argue in favor of incorporating a new voice into this debate: teachers. We showcase the value of involving teachers in all aspects related to the design of search tools for the classroom; from the beginning till the end. Driven by our research experience designing, developing, and evaluating new tools to support children's information discovery in the classroom, we share insights on the role of the experts-in-the-loop, i.e., teachers who provide the connection between search tools and students. And yes, in our case, always involving a teacher as a research partner.

[103]  arXiv:2105.03710 [pdf, other]
Title: Falling Through the Gaps: Neural Architectures as Models of Morphological Rule Learning
Authors: Deniz Beser
Subjects: Computation and Language (cs.CL)

Recent advances in neural architectures have revived the problem of morphological rule learning. We evaluate the Transformer as a model of morphological rule learning and compare it with Recurrent Neural Networks (RNN) on English, German, and Russian. We bring to the fore a hitherto overlooked problem, the morphological gaps, where the expected inflection of a word is missing. For example, 63 Russian verbs lack a first-person-singular present form such that one cannot comfortably say "*o\v{s}\v{c}u\v{s}\v{c}u" ("I feel"). Even English has gaps, such as the past participle of "stride": the function of morphological inflection can be partial. Both neural architectures produce inflections that ought to be missing. Analyses reveal that Transformers recapitulate the statistical distribution of inflections in the training data, similar to RNNs. Models' success on English and German is driven by the fact that rules in these languages can be identified with the majority forms, which is not universal.

[104]  arXiv:2105.03714 [pdf, other]
Title: Protecting Individual Interests across Clusters: Spectral Clustering with Guarantees
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)

Studies related to fairness in machine learning have recently gained traction due to its ever-expanding role in high-stakes decision making. For example, it may be desirable to ensure that all clusters discovered by an algorithm have high gender diversity. Previously, these problems have been studied under a setting where sensitive attributes, with respect to which fairness conditions impose diversity across clusters, are assumed to be observable; hence, protected groups are readily available. Most often, this may not be true, and diversity or individual interests can manifest as an intrinsic or latent feature of a social network. For example, depending on latent sensitive attributes, individuals interact with each other and represent each other's interests, resulting in a network, which we refer to as a representation graph. Motivated by this, we propose an individual fairness criterion for clustering a graph $\mathcal{G}$ that requires each cluster to contain an adequate number of members connected to the individual under a representation graph $\mathcal{R}$. We devise a spectral clustering algorithm to find fair clusters under a given representation graph. We further propose a variant of the stochastic block model and establish our algorithm's weak consistency under this model. Finally, we present experimental results to corroborate our theoretical findings.

[105]  arXiv:2105.03716 [pdf, ps, other]
Title: Continuous representations of intents for dialogue systems
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Intent modelling has become an important part of modern dialogue systems. With the rapid expansion of practical dialogue systems and virtual assistants, such as Amazon Alexa, Apple Siri, and Google Assistant, the interest has only increased. However, up until recently the focus has been on detecting a fixed, discrete, number of seen intents. Recent years have seen some work done on unseen intent detection in the context of zero-shot learning. This paper continues the prior work by proposing a novel model where intents are continuous points placed in a specialist Intent Space that yields several advantages. First, the continuous representation enables to investigate relationships between the seen intents. Second, it allows any unseen intent to be reliably represented given limited quantities of data. Finally, this paper will show how the proposed model can be augmented with unseen intents without retraining any of the seen ones. Experiments show that the model can reliably add unseen intents with a high accuracy while retaining a high performance on the seen intents.

[106]  arXiv:2105.03721 [pdf, other]
Title: Team Orienteering Coverage Planning with Uncertain Reward
Subjects: Robotics (cs.RO)

Many municipalities and large organizations have fleets of vehicles that need to be coordinated for tasks such as garbage collection or infrastructure inspection. Motivated by this need, this paper focuses on the common subproblem in which a team of vehicles needs to plan coordinated routes to patrol an area over iterations while minimizing temporally and spatially dependent costs. In particular, at a specific location (e.g., a vertex on a graph), we assume the cost grows linearly in expectation with an unknown rate, and the cost is reset to zero whenever any vehicle visits the vertex (representing the robot servicing the vertex). We formulate this problem in graph terminology and call it Team Orienteering Coverage Planning with Uncertain Reward (TOCPUR). We propose to solve TOCPUR by simultaneously estimating the accumulated cost at every vertex on the graph and solving a novel variant of the Team Orienteering Problem (TOP) iteratively, which we call the Team Orienteering Coverage Problem (TOCP). We provide the first mixed integer programming formulation for the TOCP, as a significant adaptation of the original TOP. We introduce a new benchmark consisting of hundreds of randomly generated graphs for comparing different methods. We show the proposed solution outperforms both the exact TOP solution and a greedy algorithm. In addition, we provide a demo of our method on a team of three physical robots in a real-world environment.

[107]  arXiv:2105.03725 [pdf, other]
Title: DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
Comments: This paper is accepted to SIGMETRICS 2021 and will be presented at the conference in June 2021. Our open source software will be released after the presentation at SIGMETRICS 2021
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques to more memory-centric techniques, thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement.
With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV.

[108]  arXiv:2105.03726 [pdf, other]
Title: Mental Models of Adversarial Machine Learning
Comments: 19 pages, 8 figures, under submission
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Although machine learning (ML) is widely used in practice, little is known about practitioners' actual understanding of potential security challenges. In this work, we close this substantial gap in the literature and contribute a qualitative study focusing on developers' mental models of the ML pipeline and potentially vulnerable components. Studying mental models has helped in other security fields to discover root causes or improve risk communication. Our study reveals four characteristic ranges in mental models of industrial practitioners. The first range concerns the intertwined relationship of adversarial machine learning (AML) and classical security. The second range describes structural and functional components. The third range expresses individual variations of mental models, which are neither explained by the application nor by the educational background of the corresponding subjects. The fourth range corresponds to the varying levels of technical depth, which are however not determined by our subjects' level of knowledge. Our characteristic ranges have implications for the integration of AML into corporate workflows, security enhancing tools for practitioners, and creating appropriate regulatory frameworks for AML.

[109]  arXiv:2105.03731 [pdf, ps, other]
Title: Time integrators for dispersive equations in the long wave regime
Subjects: Numerical Analysis (math.NA)

We introduce a novel class of time integrators for dispersive equations which allow us to reproduce the dynamics of the solution from the classical $ \varepsilon = 1$ up to long wave limit regime $ \varepsilon \ll 1 $ on the natural time scale of the PDE $t= \mathcal{O}(\frac{1}{\varepsilon})$. Most notably our new schemes converge with rates at order $\tau \varepsilon$ over long times $t= \frac{1}{\varepsilon}$.

[110]  arXiv:2105.03732 [pdf, ps, other]
Title: Uniformly accurate splitting schemes for the Benjamin-Bona-Mahony equation with dispersive parameter
Subjects: Numerical Analysis (math.NA)

We propose a new class of uniformly accurate splitting methods for the Benjamin-Bona-Mahony equation which converge uniformly in the dispersive parameter $\varepsilon$. The proposed splitting schemes are furthermore asymptotic convergent and preserve the KdV limit. We carry out a rigorous convergence analysis of the splitting schemes exploiting the smoothing properties in the system. This will allow us to establish improved error bounds with gain either in regularity (for non smooth solutions) or in the dispersive parameter $\varepsilon$. The latter will be interesting in regimes of a small dispersive parameter. We will in particular show that in the classical BBM case $P(\partial_x) = \partial_x$ our Lie splitting does not require any spatial regularity, i.e, first order time convergence holds in $H^{r}$ for solutions in $H^{r}$ without any loss of derivative. This estimate holds uniformly in $\varepsilon$. In regularizing regimes $\varepsilon=\mathcal{O}(1) $ we even gain a derivative with our time discretisation at the cost of loosing in terms of $\frac{1}{\varepsilon}$. Numerical experiments underline our theoretical findings.

[111]  arXiv:2105.03733 [pdf, other]
Title: Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model
Authors: Peng Lingwei
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Model-free deep reinforcement learning has achieved great success in many domains, such as video games, recommendation systems and robotic control tasks. In continuous control tasks, widely used policies with Gaussian distributions results in ineffective exploration of environments and limited performance of algorithms in many cases. In this paper, we propose a density-free off-policy algorithm, Generative Actor-Critic(GAC), using the push-forward model to increase the expressiveness of policies, which also includes an entropy-like technique, MMD-entropy regularizer, to balance the exploration and exploitation. Additionnally, we devise an adaptive mechanism to automatically scale this regularizer, which further improves the stability and robustness of GAC. The experiment results show that push-forward policies possess desirable features, such as multi-modality, which can improve the efficiency of exploration and asymptotic performance of algorithms obviously.

[112]  arXiv:2105.03736 [pdf, other]
Title: PIM-DRAM:Accelerating Machine Learning Workloads using Processing in Memory based on DRAM Technology
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)

Deep Neural Networks (DNNs) have gained significant interest in the recent past for plethora of applications such as image and video analytics, language translation, and medical diagnosis. High memory bandwidth is required to keep up with the needs of data-intensive DNN applications when implemented on a von-Neumann hardware architecture as majority of the data resides in the main memory. Therefore, processing in memory can provide a promising solution for the memory wall bottleneck for ML workloads. In this work, we propose a DRAM-based processing-in-memory (PIM) multiplication primitive coupled with intra-bank accumulation to accelerate matrix vector operations in ML workloads. Moreover, we propose a processing-in-memory DRAM bank architecture, data mapping and dataflow based on the proposed primitive. System evaluations performed on networks like AlexNet, VGG16 and ResNet18 show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU and an ideal conventional (non-PIM) baseline architecture with infinite compute bandwidth, respectively.

[113]  arXiv:2105.03743 [pdf, other]
Title: Certified Robustness to Text Adversarial Attacks by Randomized [MASK]
Comments: Accepted by Findings of ACL 2021, Long Paper
Subjects: Computation and Language (cs.CL)

Recently, few certified defense methods have been developed to provably guarantee the robustness of a text classifier to adversarial synonym substitutions. However, all existing certified defense methods assume that the defenders are informed of how the adversaries generate synonyms, which is not a realistic scenario. In this paper, we propose a certifiably robust defense method by randomly masking a certain proportion of the words in an input text, in which the above unrealistic assumption is no longer necessary. The proposed method can defend against not only word substitution-based attacks, but also character-level perturbations. We can certify the classifications of over 50% texts to be robust to any perturbation of 5 words on AGNEWS, and 2 words on SST2 dataset. The experimental results show that our randomized smoothing method significantly outperforms recently proposed defense methods across multiple datasets.

[114]  arXiv:2105.03746 [pdf, other]
Title: Contrastive Conditional Transport for Representation Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Contrastive learning (CL) has achieved remarkable success in learning data representations without label supervision. However, the conventional CL loss is sensitive to how many negative samples are included and how they are selected. This paper proposes contrastive conditional transport (CCT) that defines its CL loss over dependent sample-query pairs, which in practice is realized by drawing a random query, randomly selecting positive and negative samples, and contrastively reweighting these samples according to their distances to the query, exerting a greater force to both pull more distant positive samples towards the query and push closer negative samples away from the query. Theoretical analysis shows that this unique contrastive reweighting scheme helps in the representation space to both align the positive samples with the query and reduce the mutual information between the negative sample and query. Extensive large-scale experiments on standard vision tasks show that CCT not only consistently outperforms existing methods on benchmark datasets in contrastive representation learning but also provides interpretable contrastive weights and latent representations. PyTorch code will be provided.

[115]  arXiv:2105.03748 [pdf, other]
Title: Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems
Comments: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21), 2021
Subjects: Information Retrieval (cs.IR)

Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic evaluation. To help build a human-like user simulator that can measure the quality of a dialogue, we propose the following task: simulating user satisfaction for the evaluation of task-oriented dialogue systems. The purpose of the task is to increase the evaluation power of user simulations and to make the simulation more human-like. To overcome a lack of annotated data, we propose a user satisfaction annotation dataset, USS, that includes 6,800 dialogues sampled from multiple domains, spanning real-world e-commerce dialogues, task-oriented dialogues constructed through Wizard-of-Oz experiments, and movie recommendation dialogues. All user utterances in those dialogues, as well as the dialogues themselves, have been labeled based on a 5-level satisfaction scale. We also share three baseline methods for user satisfaction prediction and action prediction tasks. Experiments conducted on the USS dataset suggest that distributed representations outperform feature-based methods. A model based on hierarchical GRUs achieves the best performance in in-domain user satisfaction prediction, while a BERT-based model has better cross-domain generalization ability.

[116]  arXiv:2105.03753 [pdf, other]
Title: Parameterized Complexity of Feature Selection for Categorical Data Clustering
Comments: 25 pages, full version
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

We develop new algorithmic methods with provable guarantees for feature selection in regard to categorical data clustering. While feature selection is one of the most common approaches to reduce dimensionality in practice, most of the known feature selection methods are heuristics. We study the following mathematical model. We assume that there are some inadvertent (or undesirable) features of the input data that unnecessarily increase the cost of clustering. Consequently, we want to select a subset of the original features from the data such that there is a small-cost clustering on the selected features. More precisely, for given integers $\ell$ (the number of irrelevant features) and $k$ (the number of clusters), budget $B$, and a set of $n$ categorical data points (represented by $m$-dimensional vectors whose elements belong to a finite set of values $\Sigma$), we want to select $m-\ell$ relevant features such that the cost of any optimal $k$-clustering on these features does not exceed $B$. Here the cost of a cluster is the sum of Hamming distances ($\ell_0$-distances) between the selected features of the elements of the cluster and its center. The clustering cost is the total sum of the costs of the clusters. We use the framework of parameterized complexity to identify how the complexity of the problem depends on parameters $k$, $B$, and $|\Sigma|$. Our main result is an algorithm that solves the Feature Selection problem in time $f(k,B,|\Sigma|)\cdot m^{g(k,|\Sigma|)}\cdot n^2$ for some functions $f$ and $g$. In other words, the problem is fixed-parameter tractable parameterized by $B$ when $|\Sigma|$ and $k$ are constants. Our algorithm is based on a solution to a more general problem, Constrained Clustering with Outliers. We also complement our algorithmic findings with complexity lower bounds.

[117]  arXiv:2105.03756 [pdf, other]
Title: RAIL: A modular framework for Reinforcement-learning-based Adversarial Imitation Learning
Subjects: Machine Learning (cs.LG)

While Adversarial Imitation Learning (AIL) algorithms have recently led to state-of-the-art results on various imitation learning benchmarks, it is unclear as to what impact various design decisions have on performance. To this end, we present here an organizing, modular framework called Reinforcement-learning-based Adversarial Imitation Learning (RAIL) that encompasses and generalizes a popular subclass of existing AIL approaches. Using the view espoused by RAIL, we create two new IfO (Imitation from Observation) algorithms, which we term SAIfO: SAC-based Adversarial Imitation from Observation and SILEM (Skeletal Feature Compensation for Imitation Learning with Embodiment Mismatch). We go into greater depth about SILEM in a separate technical report. In this paper, we focus on SAIfO, evaluating it on a suite of locomotion tasks from OpenAI Gym, and showing that it outperforms contemporaneous RAIL algorithms that perform IfO.

[118]  arXiv:2105.03759 [pdf, other]
Title: Multi-layered planar firefighting
Comments: 22 pages, 5 figures
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)

Consider a model of fire spreading through a graph; initially some vertices are burning, and at every given time-step fire spreads from burning vertices to their neighbours. The firefighter problem is a solitaire game in which a player is allowed, at every time-step, to protect some non-burning vertices (by effectively deleting them) in order to contain the fire growth. How many vertices per turn, on average, must be protected in order to stop the fire from spreading infinitely?
Here we consider the problem on $\mathbb{Z}^2\times [h]$ for both nearest neighbour adjacency and strong adjacency. We determine the critical protection rates for these graphs to be $1.5h$ and $3h$, respectively. This establishes the fact that using an optimal two-dimensional strategy for all layers in parallel is asymptotically optimal.

[119]  arXiv:2105.03760 [pdf, other]
Title: PCA Event-Based Otical Flow for Visual Odometry
Comments: 9 pages, 8 figures, not published yet
Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the advent of neuromorphic vision sensors such as event-based cameras, a paradigm shift is required for most computer vision algorithms. Among these algorithms, optical flow estimation is a prime candidate for this process considering that it is linked to a neuromorphic vision approach. Usage of optical flow is widespread in robotics applications due to its richness and accuracy. We present a Principal Component Analysis (PCA) approach to the problem of event-based optical flow estimation. In this approach, we examine different regularization methods which efficiently enhance the estimation of the optical flow. We show that the best variant of our proposed method, dedicated to the real-time context of visual odometry, is about two times faster compared to state-of-the-art implementations while significantly improves optical flow accuracy.

[120]  arXiv:2105.03761 [pdf, other]
Title: e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

Recently, an increasing number of works have introduced models capable of generating natural language explanations (NLEs) for their predictions on vision-language (VL) tasks. Such models are appealing because they can provide human-friendly and comprehensive explanations. However, there is still a lack of unified evaluation approaches for the explanations generated by these models. Moreover, there are currently only few datasets of NLEs for VL tasks. In this work, we introduce e-ViL, a benchmark for explainable vision-language tasks that establishes a unified evaluation framework and provides the first comprehensive comparison of existing approaches that generate NLEs for VL tasks. e-ViL spans four models and three datasets. Both automatic metrics and human evaluation are used to assess model-generated explanations. We also introduce e-SNLI-VE, the largest existing VL dataset with NLEs (over 430k instances). Finally, we propose a new model that combines UNITER, which learns joint embeddings of images and text, and GPT-2, a pre-trained language model that is well-suited for text generation. It surpasses the previous state-of-the-art by a large margin across all datasets.

[121]  arXiv:2105.03767 [pdf]
Title: Aerospace Sliding Mode Control Toolbox: Relative Degree Approach with Resource Prospector Lander and Launch Vehicle Case Studies
Authors: S. Kode, Y. Shtessel (Senior Member IEEE), A. Levant (Senior Member IEEE), J. Rakoczy, M. Hannan, J. Orr
Subjects: Systems and Control (eess.SY)

Conventional Sliding mode control and observation techniques are widely used in aerospace applications, including aircrafts, UAVs, launch vehicles, missile interceptors, and hypersonic missiles. This work is dedicated to creating a MATLAB-based sliding mode controller design and simulation software toolbox that aims to support aerospace vehicle applications. An architecture of the aerospace sliding mode control toolbox (SMC Aero) using the relative degree approach is proposed. The SMC Aero libraries include 1st order sliding mode control (1-SMC), second order sliding mode control (2-SMC), higher order sliding mode (HOSM) control (either fixed gain or adaptive), as well as higher order sliding mode differentiators. The efficacy of the SMC Aero toolbox is confirmed in two case studies: controlling and simulating resource prospector lander (RPL) soft landing on the Moon and launch vehicle (LV) attitude control in ascent mode.

[122]  arXiv:2105.03773 [pdf, ps, other]
Title: Separations for Estimating Large Frequency Moments on Data Streams
Subjects: Data Structures and Algorithms (cs.DS)

We study the classical problem of moment estimation of an underlying vector whose $n$ coordinates are implicitly defined through a series of updates in a data stream. We show that if the updates to the vector arrive in the random-order insertion-only model, then there exist space efficient algorithms with improved dependencies on the approximation parameter $\varepsilon$. In particular, for any real $p > 2$, we first obtain an algorithm for $F_p$ moment estimation using $\tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{4/p}}\cdot n^{1-2/p}\right)$ bits of memory. Our techniques also give algorithms for $F_p$ moment estimation with $p>2$ on arbitrary order insertion-only and turnstile streams, using $\tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{4/p}}\cdot n^{1-2/p}\right)$ bits of space and two passes, which is the first optimal multi-pass $F_p$ estimation algorithm up to $\log n$ factors. Finally, we give an improved lower bound of $\Omega\left(\frac{1}{\varepsilon^2}\cdot n^{1-2/p}\right)$ for one-pass insertion-only streams. Our results separate the complexity of this problem both between random and non-random orders, as well as one-pass and multi-pass streams.

[123]  arXiv:2105.03775 [pdf, other]
Title: NLP-IIS@UT at SemEval-2021 Task 4: Machine Reading Comprehension using the Long Document Transformer
Comments: 6 pages, 1 figure. Accepted in SemEval2021
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Information Theory (cs.IT); Machine Learning (cs.LG)

This paper presents a technical report of our submission to the 4th task of SemEval-2021, titled: Reading Comprehension of Abstract Meaning. In this task, we want to predict the correct answer based on a question given a context. Usually, contexts are very lengthy and require a large receptive field from the model. Thus, common contextualized language models like BERT miss fine representation and performance due to the limited capacity of the input tokens. To tackle this problem, we used the Longformer model to better process the sequences. Furthermore, we utilized the method proposed in the Longformer benchmark on Wikihop dataset which improved the accuracy on our task data from 23.01% and 22.95% achieved by the baselines for subtask 1 and 2, respectively, to 70.30% and 64.38%.

[124]  arXiv:2105.03778 [pdf, other]
Title: An Exhaustive Study of Using Commercial LTE Network for UAV Communication in Rural Areas
Subjects: Networking and Internet Architecture (cs.NI)

Unmanned aerial vehicles (UAVs) have been increasingly used in a wide area of military and civilian applications such as data collection and monitoring. A reliable network for command and control, communication, and data transfer is crucial, not only for mission purposes but also for safety concerns. The already deployed cellular networks are appropriate candidates for UAV communication given the solid security and wide coverage of these networks. However, the reliability of such networks needs a comprehensive investigation. In this paper, we use the long-term evolution (LTE) network as the infrastructure for drone communication and data transfer, in a rural area. We study the communication characteristics of an LTE-connected drone during low-altitude flights, for different altitudes and UAV speeds. We show that, in such areas, the higher elevation benefits from a better signal quality and experiences a fewer number of handover processes. Higher speed flights also slightly degrade the communication performance.

[125]  arXiv:2105.03781 [pdf, other]
Title: MetaKernel: Learning Variational Random Features with Limited Labels
Comments: 19 pages,7 figures. arXiv admin note: substantial text overlap with arXiv:2006.06707
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks. The crux of few-shot learning is to extract prior knowledge from related tasks to enable fast adaptation to a new task with a limited amount of data. In this paper, we propose meta-learning kernels with random Fourier features for few-shot learning, we call MetaKernel. Specifically, we propose learning variational random features in a data-driven manner to obtain task-specific kernels by leveraging the shared knowledge provided by related tasks in a meta-learning setting. We treat the random feature basis as the latent variable, which is estimated by variational inference. The shared knowledge from related tasks is incorporated into a context inference of the posterior, which we achieve via a long-short term memory module. To establish more expressive kernels, we deploy conditional normalizing flows based on coupling layers to achieve a richer posterior distribution over random Fourier bases. The resultant kernels are more informative and discriminative, which further improves the few-shot learning. To evaluate our method, we conduct extensive experiments on both few-shot image classification and regression tasks. A thorough ablation study demonstrates that the effectiveness of each introduced component in our method. The benchmark results on fourteen datasets demonstrate MetaKernel consistently delivers at least comparable and often better performance than state-of-the-art alternatives.

[126]  arXiv:2105.03782 [pdf, other]
Title: Construction of Sparse Suffix Trees and LCE Indexes in Optimal Time and Space
Comments: 26 pages, 2 figures
Subjects: Data Structures and Algorithms (cs.DS)

The notions of synchronizing and partitioning sets are recently introduced variants of locally consistent parsings with great potential in problem-solving. In this paper we propose a deterministic algorithm that constructs for a given readonly string of length $n$ over the alphabet $\{0,1,\ldots,n^{\mathcal{O}(1)}\}$ a version of $\tau$-partitioning set with size $\mathcal{O}(b)$ and $\tau = \frac{n}{b}$ using $\mathcal{O}(b)$ space and $\mathcal{O}(\frac{1}{\epsilon}n)$ time provided $b \ge n^\epsilon$, for $\epsilon > 0$. As a corollary, for $b \ge n^\epsilon$ and constant $\epsilon > 0$, we obtain linear construction algorithms with $\mathcal{O}(b)$ space on top of the string for two major small-space indexes: a sparse suffix tree, which is a compacted trie built on $b$ chosen suffixes of the string, and a longest common extension (LCE) index, which occupies $\mathcal{O}(b)$ space and allows us to compute the longest common prefix for any pair of substrings in $\mathcal{O}(n/b)$ time. For both, the $\mathcal{O}(b)$ construction storage is asymptotically optimal since the tree itself takes $\mathcal{O}(b)$ space and any LCE index with $\mathcal{O}(n/b)$ query time must occupy at least $\mathcal{O}(b)$ space by a known trade-off (at least for $b \ge \Omega(n / \log n)$). In case of arbitrary $b \ge \Omega(\log^2 n)$, we present construction algorithms for the partitioning set, sparse suffix tree, and LCE index with $\mathcal{O}(n\log_b n)$ running time and $\mathcal{O}(b)$ space, thus also improving the state of the art.

[127]  arXiv:2105.03788 [pdf, other]
Title: Dynamic Game Theoretic Neural Optimizer
Comments: Accepted in International Conference on Machine Learning (ICML) 2021 as Oral
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)

The connection between training deep neural networks (DNNs) and optimal control theory (OCT) has attracted considerable attention as a principled tool of algorithmic design. Despite few attempts being made, they have been limited to architectures where the layer propagation resembles a Markovian dynamical system. This casts doubts on their flexibility to modern networks that heavily rely on non-Markovian dependencies between layers (e.g. skip connections in residual networks). In this work, we propose a novel dynamic game perspective by viewing each layer as a player in a dynamic game characterized by the DNN itself. Through this lens, different classes of optimizers can be seen as matching different types of Nash equilibria, depending on the implicit information structure of each (p)layer. The resulting method, called Dynamic Game Theoretic Neural Optimizer (DGNOpt), not only generalizes OCT-inspired optimizers to richer network class; it also motivates a new training principle by solving a multi-player cooperative game. DGNOpt shows convergence improvements over existing methods on image classification datasets with residual networks. Our work marries strengths from both OCT and game theory, paving ways to new algorithmic opportunities from robust optimal control and bandit-based optimization.

[128]  arXiv:2105.03789 [pdf, other]
Title: Kudu: An Efficient and Scalable Distributed Graph Pattern Mining Engine
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

This paper proposes Kudu, a general distributed execution engine with a well-defined abstraction that can be integrated with various existing single-machine graph pattern mining (GPM) systems. With this approach, the programming interfaces and codes based on existing GPM systems do not change and Kudu can transparently enable the distributed execution. The key novelty is extendable embedding which can express pattern enumeration algorithm and enable fine-grained task scheduling. To enable efficient scheduling, we propose a novel BFS-DFS hybrid exploration method that generates sufficient concurrent tasks without incurring high memory consumption. The computation and communication of Kudu can be further optimized with several effective techniques. We implemented two scalable distributed GPM systems by porting Automine and GraphPi on Kudu. Our evaluation shows that Kudu-based systems significantly outperform state-of-the-art graph partition-based GPM systems by up to three orders of magnitude, achieve similar or even better performance compared with the fastest graph replication-based systems, and scale to large datasets with graph partitioning.

[129]  arXiv:2105.03790 [pdf, other]
Title: Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study
Comments: arXiv admin note: text overlap with arXiv:2103.15792, arXiv:1910.11111
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm, such as a DNN. MTL is based on the assumption that the tasks under consideration are related; therefore it exploits shared knowledge for improving performance on each individual task. Tasks are generally considered to be homogeneous, i.e., to refer to the same type of problem. Moreover, MTL is usually based on ground truth annotations with full, or partial overlap across tasks. In this work, we deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems. We explore task-relatedness as a means for co-training, in a weakly-supervised way, tasks that contain little, or even non-overlapping annotations. Task-relatedness is introduced in MTL, either explicitly through prior expert knowledge, or through data-driven studies. We propose a novel distribution matching approach, in which knowledge exchange is enabled between tasks, via matching of their predictions' distributions. Based on this approach, we build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks. We develop case studies for: i) continuous affect estimation, action unit detection, basic emotion recognition; ii) attribute detection, face identification.
We illustrate that co-training via task relatedness alleviates negative transfer. Since FaceBehaviorNet learns features that encapsulate all aspects of facial behavior, we conduct zero-/few-shot learning to perform tasks beyond the ones that it has been trained for, such as compound emotion recognition. By conducting a very large experimental study, utilizing 10 databases, we illustrate that our approach outperforms, by large margins, the state-of-the-art in all tasks and in all databases, even in these which have not been used in its training.

[130]  arXiv:2105.03791 [pdf, other]
Title: Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning
Comments: To appear as long paper in Findings of ACL 2021
Subjects: Computation and Language (cs.CL)

Transfer learning has become the dominant paradigm for many natural language processing tasks. In addition to models being pretrained on large datasets, they can be further trained on intermediate (supervised) tasks that are similar to the target task. For small Natural Language Inference (NLI) datasets, language modelling is typically followed by pretraining on a large (labelled) NLI dataset before fine-tuning with each NLI subtask. In this work, we explore Gradient Boosted Decision Trees (GBDTs) as an alternative to the commonly used Multi-Layer Perceptron (MLP) classification head. GBDTs have desirable properties such as good performance on dense, numerical features and are effective where the ratio of the number of samples w.r.t the number of features is low. We then introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network. We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model (RoBERTa-large with MNLI pretraining). The FreeGBDT shows a consistent improvement over the MLP classification head.

[131]  arXiv:2105.03793 [pdf, ps, other]
Title: Stability and Generalization of Stochastic Gradient Methods for Minimax Problems
Comments: To appear in ICML 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs), AUC maximization and robust estimation, to mention but a few. A substantial amount of studies are devoted to studying the convergence behavior of their stochastic gradient-type algorithms. In contrast, there is relatively little work on their generalization, i.e., how the learning models built from training examples would behave on test examples. In this paper, we provide a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconcave cases through the lens of algorithmic stability. We establish a quantitative connection between stability and several generalization measures both in expectation and with high probability. For the convex-concave setting, our stability analysis shows that stochastic gradient descent ascent attains optimal generalization bounds for both smooth and nonsmooth minimax problems. We also establish generalization bounds for both weakly-convex-weakly-concave and gradient-dominated problems.

[132]  arXiv:2105.03797 [pdf, other]
Title: AnomalyHop: An SSL-based Image Anomaly Localization Method
Comments: 5 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

An image anomaly localization method based on the successive subspace learning (SSL) framework, called AnomalyHop, is proposed in this work. AnomalyHop consists of three modules: 1) feature extraction via successive subspace learning (SSL), 2) normality feature distributions modeling via Gaussian models, and 3) anomaly map generation and fusion. Comparing with state-of-the-art image anomaly localization methods based on deep neural networks (DNNs), AnomalyHop is mathematically transparent, easy to train, and fast in its inference speed. Besides, its area under the ROC curve (ROC-AUC) performance on the MVTec AD dataset is 95.9%, which is among the best of several benchmarking methods. Our codes are publicly available at Github.

[133]  arXiv:2105.03799 [pdf]
Title: Human Gait State Prediction Using Cellular Automata and Classification Using ELM
Comments: Machine Intelligence and Signal Analysis conference. Published in book Advances in Intelligent Systems and Computing, vol 748. Springer, Singapore. arXiv admin note: substantial text overlap with arXiv:1710.06548
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

In this research article, we have reported periodic cellular automata rules for different gait state prediction and classification of the gait data using extreme machine Leaning (ELM). This research is the first attempt to use cellular automaton to understand the complexity of bipedal walk. Due to nonlinearity, varying configurations throughout the gait cycle and the passive joint located at the unilateral foot-ground contact in bipedal walk resulting variation of dynamic descriptions and control laws from phase to phase for human gait is making difficult to predict the bipedal walk states. We have designed the cellular automata rules which will predict the next gait state of bipedal steps based on the previous two neighbour states. We have designed cellular automata rules for normal walk. The state prediction will help to correctly design the bipedal walk. The normal walk depends on next two states and has total 8 states. We have considered the current and previous states to predict next state. So we have formulated 16 rules using cellular automata, 8 rules for each leg. The priority order maintained using the fact that if right leg in swing phase then left leg will be in stance phase. To validate the model we have classified the gait Data using ELM [1] and achieved accuracy 60%. We have explored the trajectories and compares with another gait trajectories. Finally we have presented the error analysis for different joints.

[134]  arXiv:2105.03800 [pdf, other]
Title: Fine-Grained $ε$-Margin Closed-Form Stabilization of Parametric Hawkes Processes
Authors: Rafael Lima
Comments: Presented as a RobustML workshop paper at ICLR 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Hawkes Processes have undergone increasing popularity as default tools for modeling self- and mutually exciting interactions of discrete events in continuous-time event streams. A Maximum Likelihood Estimation (MLE) unconstrained optimization procedure over parametrically assumed forms of the triggering kernels of the corresponding intensity function are a widespread cost-effective modeling strategy, particularly suitable for data with few and/or short sequences. However, the MLE optimization lacks guarantees, except for strong assumptions on the parameters of the triggering kernels, and may lead to instability of the resulting parameters .In the present work, we show how a simple stabilization procedure improves the performance of the MLE optimization without these overly restrictive assumptions.This stabilized version of the MLE is shown to outperform traditional methods over sequences of several different lengths.

[135]  arXiv:2105.03801 [pdf, other]
Title: Long-Span Dependencies in Transformer-based Summarization Systems
Comments: ACL 2021 (accepted version)
Subjects: Computation and Language (cs.CL)

Transformer-based models have achieved state-of-the-art results in a wide range of natural language processing (NLP) tasks including document summarization. Typically these systems are trained by fine-tuning a large pre-trained model to the target task. One issue with these transformer-based models is that they do not scale well in terms of memory and compute requirements as the input length grows. Thus, for long document summarization, it can be challenging to train or fine-tune these models. In this work, we exploit large pre-trained transformer-based models and address long-span dependencies in abstractive summarization using two methods: local self-attention; and explicit content selection. These approaches are compared on a range of network configurations. Experiments are carried out on standard long-span summarization tasks, including Spotify Podcast, arXiv, and PubMed datasets. We demonstrate that by combining these methods, we can achieve state-of-the-art results on all three tasks in the ROUGE scores. Moreover, without a large-scale GPU card, our approach can achieve comparable or better results than existing approaches.

[136]  arXiv:2105.03804 [pdf, other]
Title: Slash or burn: Power line and vegetation classification for wildfire prevention
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Electric utilities are struggling to manage increasing wildfire risk in a hotter and drier climate. Utility transmission and distribution lines regularly ignite destructive fires when they make contact with surrounding vegetation. Trimming vegetation to maintain the separation from utility assets is as critical to safety as it is difficult. Each utility has tens of thousands of linear miles to manage, poor knowledge of where those assets are located, and no way to prioritize trimming. Feature-enhanced convolutional neural networks (CNNs) have proven effective in this problem space. Histograms of oriented gradients (HOG) and Hough transforms are used to increase the salience of the linear structures like power lines and poles. Data is frequently taken from drone or satellite footage, but Google Street View offers an even more scalable and lower cost solution. This paper uses $1,320$ images scraped from Street View, transfer learning on popular CNNs, and feature engineering to place images in one of three classes: (1) no utility systems, (2) utility systems with no overgrown vegetation, or (3) utility systems with overgrown vegetation. The CNN output thus yields a prioritized vegetation management system and creates a geotagged map of utility assets as a byproduct. Test set accuracy with reached $80.15\%$ using VGG11 with a trained first layer and classifier, and a model ensemble correctly classified $88.88\%$ of images with risky vegetation overgrowth.

[137]  arXiv:2105.03807 [pdf]
Title: Estimation of 3D Human Pose Using Prior Knowledge
Comments: letter
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Estimating three-dimensional human poses from the positions of two-dimensional joints has shown promising results.However, using two-dimensional joint coordinates as input loses more information than image-based approaches and results in ambiguity.In order to overcome this problem, we combine bone length and camera parameters with two-dimensional joint coordinates for input.This combination is more discriminative than the two-dimensional joint coordinates in that it can improve the accuracy of the model's prediction depth and alleviate the ambiguity that comes from projecting three-dimensional coordinates into two-dimensional space. Furthermore, we introduce direction constraints which can better measure the difference between the ground truth and the output of the proposed model. The experimental results on the H36M show that the method performed better than other state-of-the-art three-dimensional human pose estimation approaches.

[138]  arXiv:2105.03811 [pdf, other]
Title: Click-Through Rate Prediction Using Graph Neural Networks and Online Learning
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recommendation systems have been extensively studied by many literature in the past and are ubiquitous in online advertisement, shopping industry/e-commerce, query suggestions in search engines, and friend recommendation in social networks. Moreover, restaurant/music/product/movie/news/app recommendations are only a few of the applications of a recommender system. A small percent improvement on the CTR prediction accuracy has been mentioned to add millions of dollars of revenue to the advertisement industry. Click-Through-Rate (CTR) prediction is a special version of recommender system in which the goal is predicting whether or not a user is going to click on a recommended item. A content-based recommendation approach takes into account the past history of the user's behavior, i.e. the recommended products and the users reaction to them. So, a personalized model that recommends the right item to the right user at the right time is the key to building such a model. On the other hand, the so-called collaborative filtering approach incorporates the click history of the users who are very similar to a particular user, thereby helping the recommender to come up with a more confident prediction for that particular user by leveraging the wider knowledge of users who share their taste in a connected network of users. In this project, we are interested in building a CTR predictor using Graph Neural Networks complemented by an online learning algorithm that models such dynamic interactions. By framing the problem as a binary classification task, we have evaluated this system both on the offline models (GNN, Deep Factorization Machines) with test-AUC of 0.7417 and on the online learning model with test-AUC of 0.7585 using a sub-sampled version of Criteo public dataset consisting of 10,000 data points.

[139]  arXiv:2105.03812 [pdf, other]
Title: Analysis and Mitigations of Reverse Engineering Attacks on Local Feature Descriptors
Comments: 13 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

As autonomous driving and augmented reality evolve, a practical concern is data privacy. In particular, these applications rely on localization based on user images. The widely adopted technology uses local feature descriptors, which are derived from the images and it was long thought that they could not be reverted back. However, recent work has demonstrated that under certain conditions reverse engineering attacks are possible and allow an adversary to reconstruct RGB images. This poses a potential risk to user privacy. We take this a step further and model potential adversaries using a privacy threat model. Subsequently, we show under controlled conditions a reverse engineering attack on sparse feature maps and analyze the vulnerability of popular descriptors including FREAK, SIFT and SOSNet. Finally, we evaluate potential mitigation techniques that select a subset of descriptors to carefully balance privacy reconstruction risk while preserving image matching accuracy; our results show that similar accuracy can be obtained when revealing less information.

[140]  arXiv:2105.03813 [pdf, other]
Title: Adaptive and Risk-Aware Target Tracking with Heterogeneous Robot Teams
Comments: Submitted to the International Conference on Intelligent Robots and Systems 2021. 9 pages
Subjects: Robotics (cs.RO)

We consider a scenario where a team of robots with heterogeneous sensors must track a set of hostile targets which induce sensory failures on the robots. In particular, the likelihood of failures depends on the proximity between the targets and the robots. We propose a control framework that implicitly addresses the competing objectives of performance maximization and sensor preservation (which impacts the future performance of the team). Our framework consists of a predictive component -- which accounts for the risk of being detected by the target, and a reactive component -- which maximizes the performance of the team regardless of the failures that have already occurred. Based on a measure of the abundance of sensors in the team, our framework can generate aggressive and risk-averse robot configurations to track the targets. Crucially, the heterogeneous sensing capabilities of the robots are explicitly considered in each step, allowing for a more expressive risk-performance trade-off. Simulated experiments with induced sensor failures demonstrate the efficacy of the proposed approach.

[141]  arXiv:2105.03814 [pdf, other]
Title: Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture
Comments: This paper is accepted to SIGMETRICS 2021 and will be presented at the conference in June 2021. Our open source software will be released after the presentation at SIGMETRICS 2021
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of main memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM).
Recent research explores different forms of PIM architectures, motivated by the emergence of new 3D-stacked memory technologies that integrate memory with a logic layer where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture.
This paper provides the first comprehensive analysis of the first publicly-available real-world PIM architecture. We make two key contributions. First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new insights. Second, we present PrIM, a benchmark suite of 16 workloads from different application domains (e.g., linear algebra, databases, graph processing, neural networks, bioinformatics).

[142]  arXiv:2105.03815 [pdf, other]
Title: Knowledge-based Review Generation by Coherence Enhanced Text Planning
Comments: Accepted by SIGIR 2021 (Long Paper)
Subjects: Computation and Language (cs.CL)

As a natural language generation task, it is challenging to generate informative and coherent review text. In order to enhance the informativeness of the generated text, existing solutions typically learn to copy entities or triples from knowledge graphs (KGs). However, they lack overall consideration to select and arrange the incorporated knowledge, which tends to cause text incoherence.
To address the above issue, we focus on improving entity-centric coherence of the generated reviews by leveraging the semantic structure of KGs. In this paper, we propose a novel Coherence Enhanced Text Planning model (CETP) based on knowledge graphs (KGs) to improve both global and local coherence for review generation. The proposed model learns a two-level text plan for generating a document: (1) the document plan is modeled as a sequence of sentence plans in order, and (2) the sentence plan is modeled as an entity-based subgraph from KG. Local coherence can be naturally enforced by KG subgraphs through intra-sentence correlations between entities. For global coherence, we design a hierarchical self-attentive architecture with both subgraph- and node-level attention to enhance the correlations between subgraphs. To our knowledge, we are the first to utilize a KG-based text planning model to enhance text coherence for review generation. Extensive experiments on three datasets confirm the effectiveness of our model on improving the content coherence of generated texts.

[143]  arXiv:2105.03817 [pdf, ps, other]
Title: TrTr: Visual Tracking with Transformer
Comments: 11 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Template-based discriminative trackers are currently the dominant tracking methods due to their robustness and accuracy, and the Siamese-network-based methods that depend on cross-correlation operation between features extracted from template and search images show the state-of-the-art tracking performance. However, general cross-correlation operation can only obtain relationship between local patches in two feature maps. In this paper, we propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture to gain global and rich contextual interdependencies. In this new architecture, features of the template image is processed by a self-attention module in the encoder part to learn strong context information, which is then sent to the decoder part to compute cross-attention with the search image features processed by another self-attention module. In addition, we design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor. We extensively evaluate our tracker TrTr, on VOT2018, VOT2019, OTB-100, UAV, NfS, TrackingNet, and LaSOT benchmarks and our method performs favorably against state-of-the-art algorithms. Training code and pretrained models are available at https://github.com/tongtybj/TrTr.

[144]  arXiv:2105.03818 [pdf, other]
Title: Heterogeneous Risk Minimization
Comments: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021. (ICML2021)
Subjects: Machine Learning (cs.LG)

Machine learning algorithms with empirical risk minimization usually suffer from poor generalization performance due to the greedy exploitation of correlations among the training data, which are not stable under distributional shifts. Recently, some invariant learning methods for out-of-distribution (OOD) generalization have been proposed by leveraging multiple training environments to find invariant relationships. However, modern datasets are frequently assembled by merging data from multiple sources without explicit source labels. The resultant unobserved heterogeneity renders many invariant learning methods inapplicable. In this paper, we propose Heterogeneous Risk Minimization (HRM) framework to achieve joint learning of latent heterogeneity among the data and invariant relationship, which leads to stable prediction despite distributional shifts. We theoretically characterize the roles of the environment labels in invariant learning and justify our newly proposed HRM framework. Extensive experimental results validate the effectiveness of our HRM framework.

[145]  arXiv:2105.03819 [pdf]
Title: Evaluating Deep Neural Network Ensembles by Majority Voting cum Meta-Learning scheme
Comments: Included in Proceedings of 3rd ICSCSP 2020
Subjects: Machine Learning (cs.LG)

Deep Neural Networks (DNNs) are prone to overfitting and hence have high variance. Overfitted networks do not perform well for a new data instance. So instead of using a single DNN as classifier we propose an ensemble of seven independent DNN learners by varying only the input to these DNNs keeping their architecture and intrinsic properties same. To induce variety in the training input, for each of the seven DNNs, one-seventh of the data is deleted and replenished by bootstrap sampling from the remaining samples. We have proposed a novel technique for combining the prediction of the DNN learners in the ensemble. Our method is called pre-filtering by majority voting coupled with stacked meta-learner which performs a two-step confi-dence check for the predictions before assigning the final class labels. All the algorithms in this paper have been tested on five benchmark datasets name-ly, Human Activity Recognition (HAR), Gas sensor array drift, Isolet, Spam-base and Internet advertisements. Our ensemble approach achieves higher accuracy than a single DNN and the average individual accuracies of DNNs in the ensemble, as well as the baseline approaches of plurality voting and meta-learning.

[146]  arXiv:2105.03821 [pdf, other]
Title: Exploiting Path Information for Anchor Based Graph Neural Network
Comments: 10 pages, 5 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Learning node representation that incorporating information from graph structure benefits wide range of tasks on graph. Majority of existing graph neural networks (GNNs) have limited power in capturing position information for a given node. The idea of positioning nodes with selected anchors has been exploit, yet mainly rely on explicit labeling of distance information. Here we propose Graph Inference Representation (GIR), an anchor based GNN encoding path information related to anchors for each node. Abilities to get position-aware embedding are theoretically and experimentally investigated on GIRs and its core variants. Further, the complementary characteristic of GIRs and typical GNNs embeddings are demonstrated. We show that GIRs get outperformed results on position-aware scenario, and could improve GNNs results by fuse GIRs embedding.

[147]  arXiv:2105.03822 [pdf, other]
Title: RBNN: Memory-Efficient Reconfigurable Deep Binary Neural Network with IP Protection for Internet of Things
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Though deep neural network models exhibit outstanding performance for various applications, their large model size and extensive floating-point operations render deployment on mobile computing platforms a major challenge, and, in particular, on Internet of Things devices. One appealing solution is model quantization that reduces the model size and uses integer operations commonly supported by microcontrollers . To this end, a 1-bit quantized DNN model or deep binary neural network maximizes the memory efficiency, where each parameter in a BNN model has only 1-bit. In this paper, we propose a reconfigurable BNN (RBNN) to further amplify the memory efficiency for resource-constrained IoT devices. Generally, the RBNN can be reconfigured on demand to achieve any one of M (M>1) distinct tasks with the same parameter set, thus only a single task determines the memory requirements. In other words, the memory utilization is improved by times M. Our extensive experiments corroborate that up to seven commonly used tasks can co-exist (the value of M can be larger). These tasks with a varying number of classes have no or negligible accuracy drop-off on three binarized popular DNN architectures including VGG, ResNet, and ReActNet. The tasks span across different domains, e.g., computer vision and audio domains validated herein, with the prerequisite that the model architecture can serve those cross-domain tasks. To protect the intellectual property of an RBNN model, the reconfiguration can be controlled by both a user key and a device-unique root key generated by the intrinsic hardware fingerprint. By doing so, an RBNN model can only be used per paid user per authorized device, thus benefiting both the user and the model provider.

[148]  arXiv:2105.03824 [pdf, other]
Title: FNet: Mixing Tokens with Fourier Transforms
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We show that Transformer encoder architectures can be massively sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that "mix" input tokens. These linear transformations, along with simple nonlinearities in feed-forward layers, are sufficient to model semantic relationships in several text classification tasks. Perhaps most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder with a standard, unparameterized Fourier Transform achieves 92% of the accuracy of BERT on the GLUE benchmark, but pre-trains and runs up to seven times faster on GPUs and twice as fast on TPUs. The resulting model, which we name FNet, scales very efficiently to long inputs, matching the accuracy of the most accurate "efficient" Transformers on the Long Range Arena benchmark, but training and running faster across all sequence lengths on GPUs and relatively shorter sequence lengths on TPUs. Finally, FNet has a light memory footprint and is particularly efficient at smaller model sizes: for a fixed speed and accuracy budget, small FNet models outperform Transformer counterparts.

[149]  arXiv:2105.03826 [pdf]
Title: A Hybrid Model for Combining Neural Image Caption and k-Nearest Neighbor Approach for Image Captioning
Comments: Included in Proceedings of 3rd ICSCSP 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A hybrid model is proposed that integrates two popular image captioning methods to generate a text-based summary describing the contents of the image. The two image captioning models are the Neural Image Caption (NIC) and the k-nearest neighbor approach. These are trained individually on the training set. We extract a set of five features, from the validation set, for evaluating the results of the two models that in turn is used to train a logistic regression classifier. The BLEU-4 scores of the two models are compared for generating the binary-value ground truth for the logistic regression classifier. For the test set, the input images are first passed separately through the two models to generate the individual captions. The five-dimensional feature set extracted from the two models is passed to the logistic regression classifier to take a decision regarding the final caption generated which is the best of two captions generated by the models. Our implementation of the k-nearest neighbor model achieves a BLEU-4 score of 15.95 and the NIC model achieves a BLEU-4 score of 16.01, on the benchmark Flickr8k dataset. The proposed hybrid model is able to achieve a BLEU-4 score of 18.20 proving the validity of our approach.

[150]  arXiv:2105.03827 [pdf, other]
Title: Good Practices and A Strong Baseline for Traffic Anomaly Detection
Comments: We rank $1^{st}$ in the CVPR 2021 NVIDIA AI CITY Challenge for Traffic Anomaly detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The detection of traffic anomalies is a critical component of the intelligent city transportation management system. Previous works have proposed a variety of notable insights and taken a step forward in this field, however, dealing with the complex traffic environment remains a challenge. Moreover, the lack of high-quality data and the complexity of the traffic scene, motivate us to study this problem from a hand-crafted perspective. In this paper, we propose a straightforward and efficient framework that includes pre-processing, a dynamic track module, and post-processing. With video stabilization, background modeling, and vehicle detection, the pro-processing phase aims to generate candidate anomalies. The dynamic tracking module seeks and locates the start time of anomalies by utilizing vehicle motion patterns and spatiotemporal status. Finally, we use post-processing to fine-tune the temporal boundary of anomalies. Not surprisingly, our proposed framework was ranked $1^{st}$ in the NVIDIA AI CITY 2021 leaderboard for traffic anomaly detection. The code is available at: https://github.com/Endeavour10020/AICity2021-Anomaly-Detection .

[151]  arXiv:2105.03828 [pdf, other]
Title: Impacts of Privately Owned Electric Vehicles on Distribution System Resilience: A Multi-agent Optimization Approach
Subjects: Systems and Control (eess.SY)

We investigate the effects of private electric vehicles (EVs) on the resilience of distribution systems after disruptions. We propose a framework of network-based multi-agent optimization problems with equilibrium constraints (N-MOPEC) to consider the decentralized decision making of stakeholders in transportation and energy systems. To solve the high-dimensional non-convex problem, we develop an efficient computational algorithm based on exact convex reformulation. Numerical studies are conducted to illustrate the effectiveness of our modeling and computational approach and to draw policy insights. The proposed modeling and computational strategies could provide a solid foundation for the future study of power system resilience with private EVs in coupled transportation and power networks.

[152]  arXiv:2105.03830 [pdf, other]
Title: Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Rain is a common natural phenomenon. Taking images in the rain however often results in degraded quality of images, thus compromises the performance of many computer vision systems. Most existing de-rain algorithms use only one single input image and aim to recover a clean image. Few work has exploited stereo images. Moreover, even for single image based monocular deraining, many current methods fail to complete the task satisfactorily because they mostly rely on per pixel loss functions and ignore semantic information. In this paper, we present a Paired Rain Removal Network (PRRNet), which exploits both stereo images and semantic information. Specifically, we develop a Semantic-Aware Deraining Module (SADM) which solves both tasks of semantic segmentation and deraining of scenes, and a Semantic-Fusion Network (SFNet) and a View-Fusion Network (VFNet) which fuse semantic information and multi-view information respectively. In addition, we also introduce an Enhanced Paired Rain Removal Network (EPRRNet) which exploits semantic prior to remove rain streaks from stereo images. We first use a coarse deraining network to reduce the rain streaks on the input images, and then adopt a pre-trained semantic segmentation network to extract semantic features from the coarse derained image. Finally, a parallel stereo deraining network fuses semantic and multi-view information to restore finer results. We also propose new stereo based rainy datasets for benchmarking. Experiments on both monocular and the newly proposed stereo rainy datasets demonstrate that the proposed method achieves the state-of-the-art performance.

[153]  arXiv:2105.03831 [pdf, ps, other]
Title: Super Solutions of the Model RB
Authors: Guangyan Zhou, Wei Xu
Comments: 8 pages
Subjects: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI)

The concept of super solution is a special type of generalized solutions with certain degree of robustness and stability. In this paper we consider the $(1,1)$-super solutions of the model RB. Using the first moment method, we establish a "threshold" such that as the constraint density crosses this value, the expected number of $(1,1)$-super solutions goes from $0$ to infinity.

[154]  arXiv:2105.03832 [pdf, other]
Title: Dataset and Performance Comparison of Deep Learning Architectures for Plum Detection and Robotic Harvesting
Comments: 20 pages, 8 figures, 2 tables. Associated dataset at this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Many automated operations in agriculture, such as weeding and plant counting, require robust and accurate object detectors. Robotic fruit harvesting is one of these, and is an important technology to address the increasing labour shortages and uncertainty suffered by tree crop growers. An eye-in-hand sensing setup is commonly used in harvesting systems and provides benefits to sensing accuracy and flexibility. However, as the hand and camera move from viewing the entire trellis to picking a specific fruit, large changes in lighting, colour, obscuration and exposure occur. Object detection algorithms used in harvesting should be robust to these challenges, but few datasets for assessing this currently exist. In this work, two new datasets are gathered during day and night operation of an actual robotic plum harvesting system. A range of current generation deep learning object detectors are benchmarked against these. Additionally, two methods for fusing depth and image information are tested for their impact on detector performance. Significant differences between day and night accuracy of different detectors is found, transfer learning is identified as essential in all cases, and depth information fusion is assessed as only marginally effective. The dataset and benchmark models are made available online.

[155]  arXiv:2105.03833 [pdf, other]
Title: Euclidean Distance-Optimal Post-Processing of Grid-Based Paths
Subjects: Robotics (cs.RO)

Paths planned over grids can often be suboptimal in an Euclidean space and contain a large number of unnecessary turns. Consequently, researchers have looked into post-processing techniques to improve the paths after they are planned. In this paper, we propose a novel post-processing technique, called Homotopic Visibility Graph Planning (HVG) which differentiates itself from existing post-processing methods in that it is guaranteed to shorten the path such that it is at least as short as the provably shortest path that lies within the same topological class as the initially computed path. We propose the algorithm, provide proofs and compare it experimentally against other post-processing methods and any-angle planning algorithms.

[156]  arXiv:2105.03834 [pdf, other]
Title: Learning Image Attacks toward Vision Guided Autonomous Vehicles
Subjects: Robotics (cs.RO); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

While adversarial neural networks have been shown successful for static image attacks, very few approaches have been developed for attacking online image streams while taking into account the underlying physical dynamics of autonomous vehicles, their mission, and environment. This paper presents an online adversarial machine learning framework that can effectively misguide autonomous vehicles' missions. In the existing image attack methods devised toward autonomous vehicles, optimization steps are repeated for every image frame. This framework removes the need for fully converged optimization at every frame to realize image attacks in real-time. Using reinforcement learning, a generative neural network is trained over a set of image frames to obtain an attack policy that is more robust to dynamic and uncertain environments. A state estimator is introduced for processing image streams to reduce the attack policy's sensitivity to physical variables such as unknown position and velocity. A simulation study is provided to validate the results.

[157]  arXiv:2105.03835 [pdf, other]
Title: Segmenting Hybrid Trajectories using Latent ODEs
Subjects: Machine Learning (cs.LG)

Smooth dynamics interrupted by discontinuities are known as hybrid systems and arise commonly in nature. Latent ODEs allow for powerful representation of irregularly sampled time series but are not designed to capture trajectories arising from hybrid systems. Here, we propose the Latent Segmented ODE (LatSegODE), which uses Latent ODEs to perform reconstruction and changepoint detection within hybrid trajectories featuring jump discontinuities and switching dynamical modes. Where it is possible to train a Latent ODE on the smooth dynamical flows between discontinuities, we apply the pruned exact linear time (PELT) algorithm to detect changepoints where latent dynamics restart, thereby maximizing the joint probability of a piece-wise continuous latent dynamical representation. We propose usage of the marginal likelihood as a score function for PELT, circumventing the need for model complexity-based penalization. The LatSegODE outperforms baselines in reconstructive and segmentation tasks including synthetic data sets of sine waves, Lotka Volterra dynamics, and UCI Character Trajectories.

[158]  arXiv:2105.03838 [pdf, other]
Title: HyperHyperNetworks for the Design of Antenna Arrays
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)

We present deep learning methods for the design of arrays and single instances of small antennas. Each design instance is conditioned on a target radiation pattern and is required to conform to specific spatial dimensions and to include, as part of its metallic structure, a set of predetermined locations. The solution, in the case of a single antenna, is based on a composite neural network that combines a simulation network, a hypernetwork, and a refinement network. In the design of the antenna array, we add an additional design level and employ a hypernetwork within a hypernetwork. The learning objective is based on measuring the similarity of the obtained radiation pattern to the desired one. Our experiments demonstrate that our approach is able to design novel antennas and antenna arrays that are compliant with the design requirements, considerably better than the baseline methods. We compare the solutions obtained by our method to existing designs and demonstrate a high level of overlap. When designing the antenna array of a cellular phone, the obtained solution displays improved properties over the existing one.

[159]  arXiv:2105.03839 [pdf, other]
Title: News Kaleidoscope: Visual Investigation of Coverage Diversity in News Event Reporting
Subjects: Human-Computer Interaction (cs.HC)

We develop a visual analytics system, NewsKaleidoscope, to investigate the how news reporting of events varies. NewsKaleidoscope combines several backend text language processing techniques with a coordinated visualization interface tailored for visualization non-expert users. To robustly evaluate NewsKaleidoscope, we conduct a trio of user studies. (1) A usability study with news novices assesses the overall system and the specific insights promoted for journalism-agnostic users. (2) A follow-up study with news experts assesses the insights promoted for journalism-savvy users. (3) Based on identified system limitations in these two studies, we amend NewsKaleidoscope design and conduct a third study to validate these improvements. Results indicate that, for both news novice and experts, NewsKaleidoscope supports an effective, task-driven workflow for analyzing the diversity of news coverage about events, though journalism expertise has a significant influence on the user insights and takeaways. Our insights while developing and evaluating NewsKaleidoscope can aid future interface designs that combine visualization with natural language processing to analyze coverage diversity in news event reporting.

[160]  arXiv:2105.03841 [pdf, other]
Title: The Temporal Dictionary Ensemble (TDE) Classifier for Time Series Classification
Comments: arXiv admin note: text overlap with arXiv:1911.12008
Journal-ref: ECML PKDD 2020: Machine Learning and Knowledge Discovery in Databases, pages 660-676, 2020
Subjects: Machine Learning (cs.LG)

Using bag of words representations of time series is a popular approach to time series classification. These algorithms involve approximating and discretising windows over a series to form words, then forming a count of words over a given dictionary. Classifiers are constructed on the resulting histograms of word counts. A 2017 evaluation of a range of time series classifiers found the bag of symbolic-fourier approximation symbols (BOSS) ensemble the best of the dictionary based classifiers. It forms one of the components of hierarchical vote collective of transformation-based ensembles (HIVE-COTE), which represents the current state of the art. Since then, several new dictionary based algorithms have been proposed that are more accurate or more scalable (or both) than BOSS. We propose a further extension of these dictionary based classifiers that combines the best elements of the others combined with a novel approach to constructing ensemble members based on an adaptive Gaussian process model of the parameter space. We demonstrate that the temporal dictionary ensemble (TDE) is more accurate than other dictionary based approaches. Furthermore, unlike the other classifiers, if we replace BOSS in HIVE-COTE with TDE, HIVE-COTE is significantly more accurate. We also show this new version of HIVE-COTE is significantly more accurate than the current best deep learning approach, a recently proposed hybrid tree ensemble and a recently introduced competitive classifier making use of highly randomised convolutional kernels. This advance represents a new state of the art for time series classification.

[161]  arXiv:2105.03842 [pdf, other]
Title: FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER) than original ASR outputs. Previous works usually use a sequence-to-sequence model to correct an ASR output sentence autoregressively, which causes large latency and cannot be deployed in online ASR services. A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate. In this paper, observing distinctive error patterns and correction operations (i.e., insertion, deletion, and substitution) in ASR, we propose FastCorrect, a novel NAR error correction model based on edit alignment. In training, FastCorrect aligns each source token from an ASR output sentence to the target tokens from the corresponding ground-truth sentence based on the edit distance between the source and target sentences, and extracts the number of target tokens corresponding to each source token during edition/correction, which is then used to train a length predictor and to adjust the source tokens to match the length of the target sentence for parallel generation. In inference, the token number predicted by the length predictor is used to adjust the source tokens for target sequence generation. Experiments on the public AISHELL-1 dataset and an internal industrial-scale ASR dataset show the effectiveness of FastCorrect for ASR error correction: 1) it speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model; and 2) it outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.

[162]  arXiv:2105.03844 [pdf, other]
Title: Reinforcement Learning with Expert Trajectory For Quantitative Trading
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Trading and Market Microstructure (q-fin.TR)

In recent years, quantitative investment methods combined with artificial intelligence have attracted more and more attention from investors and researchers. Existing related methods based on the supervised learning are not very suitable for learning problems with long-term goals and delayed rewards in real futures trading. In this paper, therefore, we model the price prediction problem as a Markov decision process (MDP), and optimize it by reinforcement learning with expert trajectory. In the proposed method, we employ more than 100 short-term alpha factors instead of price, volume and several technical factors in used existing methods to describe the states of MDP. Furthermore, unlike DQN (deep Q-learning) and BC (behavior cloning) in related methods, we introduce expert experience in training stage, and consider both the expert-environment interaction and the agent-environment interaction to design the temporal difference error so that the agents are more adaptable for inevitable noise in financial data. Experimental results evaluated on share price index futures in China, including IF (CSI 300) and IC (CSI 500), show that the advantages of the proposed method compared with three typical technical analysis and two deep leaning based methods.

[163]  arXiv:2105.03851 [pdf]
Title: Employing Agent Beliefs during Fault Diagnosis for IEC 61499 Industrial Cyber-Physical Systems
Comments: Conference paper, 6 pages, 6 figures
Journal-ref: Proceedings of the 46th Annual Conference of the IEEE Industrial Electronics Society (IECON2020). IEEE Computer Society Press, pp.2189-2194
Subjects: Software Engineering (cs.SE)

We have come to rely on industrial-scale cyber-physical systems more and more to manage tasks and machinery in safety-critical situations. Efficient, reliable fault identification and management has become a critical factor in the design of these increasingly sophisticated and complex devices. Teams of co-operating software agents are one way to coordinate the flow of diagnostic information gathered during fault-finding. By wielding domain knowledge of the software architecture used to construct the system, agents build and refine their beliefs about the location and root cause of faults. This paper examines how agents constructed within the GORITE Multi-Agent Framework create and refine their beliefs. We demonstrate three different belief structures implemented within our Fault Diagnostic Engine, showing how each supports a distinct aspect of the agent's reasoning. Using domain knowledge of the IEC 61499 Function Block architecture, agents are able to examine and rigorously evaluate both individual components and entire subsystems.

[164]  arXiv:2105.03852 [pdf, other]
Title: Towards Dynamic Feature Selection with Attention to Assist Banking Customers in Establishing a New Business
Subjects: Machine Learning (cs.LG)

Establishing a new business may involve Knowledge acquisition in various areas, from personal to business and marketing sources. This task is challenging as it requires examining various data islands to uncover hidden patterns and unknown correlations such as purchasing behavior, consumer buying signals, and demographic and socioeconomic attributes of different locations. This paper introduces a novel framework for extracting and identifying important features from banking and non-banking data sources to address this challenge. We present an attention-based supervised feature selection approach to select important and relevant features which contribute most to the customer's query regarding establishing a new business. We report on the experiment conducted on an openly available dataset created from Kaggle and the UCI machine learning repositories.

[165]  arXiv:2105.03855 [pdf]
Title: GMOTE: Gaussian based minority oversampling technique for imbalanced classification adapting tail probability of outliers
Comments: 20 pages, 6 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Classification of imbalanced data is one of the common problems in the recent field of data mining. Imbalanced data substantially affects the performance of standard classification models. Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE). However, since the methods such as SMOTE generate instances by linear interpolation, synthetic data space may look like a polygonal. Also, the oversampling methods generate outliers of the minority class. In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets. To avoid linear interpolation and to consider outliers, this proposed method generates instances by the Gaussian Mixture Model. Motivated by clustering-based multivariate Gaussian outlier score (CMGOS), we propose to adapt tail probability of instances through the Mahalanobis distance to consider local outliers. The experiment was carried out on a representative set of benchmark datasets. The performance of the GMOTE is compared with other methods such as SMOTE. When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score. Experimental results demonstrate the robust performance.

[166]  arXiv:2105.03856 [pdf, ps, other]
Title: The D-plus Discriminant and Complexity of Root Clustering
Subjects: Symbolic Computation (cs.SC)

Let $p(x)$ be an integer polynomial with $m\ge 2$ distinct roots $\alpha_1,\ldots,\alpha_m$ whose multiplicities are $\boldsymbol{\mu}=(\mu_1,\ldots,\mu_m)$. We define the D-plus discriminant of $p(x)$ to be $D^+(p):= \prod_{1\le i<j\le m}(\alpha_i-\alpha_j)^{\mu_i+\mu_j}$. Unlike the classical discriminant, $D^+(p)$ never vanishes. We first prove a conjecture that $D^+(p)$ is a $\boldsymbol{\mu}$-symmetric function of its roots $\alpha_1,\ldots,\alpha_m$. Our main result gives an explicit formula for $D^+(p)$, as a rational function of its coefficients. A basic tool used by our proof is the "symbolic Poisson resultant". The D-plus discriminant first arose in the complexity analysis of a root clustering algorithm from Becker et al. (ISSAC 2016). The bit-complexity of this algorithm is proportional to a quantity $\log(|D^+(p)|^{-1})$. As an application of our main result, we give an explicit upper bound on this quantity in terms of the degree of $p$ and its leading coefficient.

[167]  arXiv:2105.03857 [pdf, other]
Title: Seismic Fault Segmentation via 3D-CNN Training by a Few 2D Slices Labels
Comments: 22 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Geophysics (physics.geo-ph)

Detection faults in seismic data is a crucial step for seismic structural interpretation, reservoir characterization and well placement, and it is full of challenges. Some recent works regard fault detection as an image segmentation task. The task of image segmentation requires a large amount of data labels, especially 3D seismic data, which has a complex structure and a lot of noise. Therefore, its annotation requires expert experience and a huge workload, wrong labeling and missing labeling will affect the segmentation performance of the model. In this study, we present a new binary cross-entropy and smooth L1 loss ({\lambda}-BCE and {\lambda}-smooth L1) to effectively train 3D-CNN by sampling some 2D slices from 3D seismic data, so that the model can learn the segmentation of 3D seismic data from a few 2D slices. In order to fully extract information from limited and low-dimensional data and suppress seismic noise, we propose an attention module that can be used for active supervision training (Active Attention Module, AAM) and embedded in the network to participate in the differentiation and optimization of the model. During training, the attention heatmap target is generated by the original binary label, and letting it supervise the attention module using the {\lambda}-smooth L1 loss. Qualitative experiments show that our method can extract 3D seismic features from a few 2D slices labels on real data, to segment a complete fault volume. Through visualization, the segmentation effect achieves state-of-the-art. Quantitative experiments on synthetic data prove the effectiveness of our training method and attention module. Experiments show that using our method, labeling one 2D slice every 30 frames at least (3.3% of the original label), the model can achieve a segmentation performance similar to that of a 3D label.

[168]  arXiv:2105.03858 [pdf, other]
Title: Location-Based Timing Advance Estimation for 5G Integrated LEO Satellite Communications
Subjects: Information Theory (cs.IT)

Integrated satellite-terrestrial communications networks aim to exploit both the satellite and the ground mobile communications, thus providing genuine ubiquitous coverage. For 5G integrated low earth orbit (LEO) satellite communication systems, the timing advance (TA) is required to be estimated in the initial random access procedure in order to facilitate the uplink frame alignment among different users. However, due to the inherent characteristics of LEO satellite communication systems, e.g., wide beam coverage and long propagation delays, the existing 5G terrestrial uplink TA scheme is not applicable in the satellite networks. In this paper, we investigate location-based TA estimation for 5G integrated LEO satellite communication systems. We obtain the time difference of arrival (TDOA) and frequency difference of arrival (FDOA) measurements in the downlink timing and frequency synchronization phase, which are made from the satellite at different time instants. We propose to take these measurements for either UE geolocation or ephemeris estimation, thus calculating the TA value. The estimation is then formulated as a quadratic optimization problem whose globally optimal solution can be obtained by a quadratic penalty algorithm. To reduce the computational complexity, we further propose an alternative approximation method based on iteratively performing a linearization procedure on the quadratic equality constraints. Numerical results show that the proposed methods can approach the constrained Cramer-Rao lower bound (CRLB) of the TA estimation and thus assure uplink frame alignment for different users.

[169]  arXiv:2105.03859 [pdf, ps, other]
Title: RRCD: Redirección de Registros Basada en Compresión de Datos para Tolerar FallosPermanentes en una GPU
Comments: 10 page, in Spanish, 6 Figures, to be submitted to Jornadas SARTECO 2021
Subjects: Hardware Architecture (cs.AR)

The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations. Reducing the supply voltage beyond its safe limit is an effective way to improve the energy efficiency of register files. However, at these operating voltages, the reliability of the circuit is compromised. This work aims to tolerate permanent faults from process variations in large GPU register files operating below the safe supply voltage limit. To do so, this paper proposes a microarchitectural patching technique, DC-Patch, exploiting the inherent data redundancy of applications to compress registers at run-time with neither compiler assistance nor instruction set modifications. Instead of disabling an entire faulty register file entry, DC-Patch leverages the reliable cells within a faulty entry to store compressed register values. Experimental results show that, with more than a third of faulty register entries, DC-Patch ensures a reliable operation of the register file and reduces the energy consumption by 47% with respect to a conventional register file working at nominal supply voltage. The energy savings are 21% compared to a voltage noise smoothing scheme operating at the safe supply voltage limit. These benefits are obtained with less than 2 and 6% impact on the system performance and area, respectively.

[170]  arXiv:2105.03864 [pdf, other]
Title: Quick NAT: High performance NAT system on commodity platforms
Journal-ref: 2017 IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN)
Subjects: Networking and Internet Architecture (cs.NI)

NAT gateway is an important network system in today's IPv4 network when translating a private IPv4 address to a public address. However, traditional NAT system based on Linux Netfilter cannot achieve high network throughput to meet modern requirements such as data centers. To address this challenge, we improve the network performance of NAT system by three ways. First, we leverage DPDK to enable polling and zero-copy delivery, so as to reduce the cost of interrupt and packet copies. Second, we enable multiple CPU cores to process in parallel and use lock-free hash table to minimize the contention between CPU cores. Third, we use hash search instead of sequential search when looking up the NAT rule table. Evaluation shows that our Quick NAT system significantly improves the performance of NAT on commodity platforms.

[171]  arXiv:2105.03867 [pdf, other]
Title: Improving Cost Learning for JPEG Steganography by Exploiting JPEG Domain Knowledge
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Although significant progress in automatic learning of steganographic cost has been achieved recently, existing methods designed for spatial images are not well applicable to JPEG images which are more common media in daily life. The difficulties of migration mostly lie in the unique and complicated JPEG characteristics caused by 8x8 DCT mode structure. To address the issue, in this paper we extend an existing automatic cost learning scheme to JPEG, where the proposed scheme called JEC-RL (JPEG Embedding Cost with Reinforcement Learning) is explicitly designed to tailor the JPEG DCT structure. It works with the embedding action sampling mechanism under reinforcement learning, where a policy network learns the optimal embedding policies via maximizing the rewards provided by an environment network. The policy network is constructed following a domain-transition design paradigm, where three modules including pixel-level texture complexity evaluation, DCT feature extraction, and mode-wise rearrangement, are proposed. These modules operate in serial, gradually extracting useful features from a decompressed JPEG image and converting them into embedding policies for DCT elements, while considering JPEG characteristics including inter-block and intra-block correlations simultaneously. The environment network is designed in a gradient-oriented way to provide stable reward values by using a wide architecture equipped with a fixed preprocessing layer with 8x8 DCT basis filters. Extensive experiments and ablation studies demonstrate that the proposed method can achieve good security performance for JPEG images against both advanced feature based and modern CNN based steganalyzers.

[172]  arXiv:2105.03868 [pdf, other]
Title: Non-Recursive Graph Convolutional Networks
Comments: 5 pages, 2 figures. Accepted to ICASSP 2021
Subjects: Machine Learning (cs.LG)

Graph Convolutional Networks (GCNs) are powerful models for node representation learning tasks. However, the node representation in existing GCN models is usually generated by performing recursive neighborhood aggregation across multiple graph convolutional layers with certain sampling methods, which may lead to redundant feature mixing, needless information loss, and extensive computations. Therefore, in this paper, we propose a novel architecture named Non-Recursive Graph Convolutional Network (NRGCN) to improve both the training efficiency and the learning performance of GCNs in the context of node classification. Specifically, NRGCN proposes to represent different hops of neighbors for each node based on inner-layer aggregation and layer-independent sampling. In this way, each node can be directly represented by concatenating the information extracted independently from each hop of its neighbors thereby avoiding the recursive neighborhood expansion across layers. Moreover, the layer-independent sampling and aggregation can be precomputed before the model training, thus the training process can be accelerated considerably. Extensive experiments on benchmark datasets verify that our NRGCN outperforms the state-of-the-art GCN models, in terms of the node classification performance and reliability.

[173]  arXiv:2105.03869 [pdf, other]
Title: Trajectory Prediction for Autonomous Driving with Topometric Map
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

State-of-the-art autonomous driving systems rely on high definition (HD) maps for localization and navigation. However, building and maintaining HD maps is time-consuming and expensive. Furthermore, the HD maps assume structured environment such as the existence of major road and lanes, which are not present in rural areas. In this work, we propose an end-to-end transformer networks based approach for map-less autonomous driving. The proposed model takes raw LiDAR data and noisy topometric map as input and produces precise local trajectory for navigation. We demonstrate the effectiveness of our method in real-world driving data, including both urban and rural areas. The experimental results show that the proposed method outperforms state-of-the-art multimodal methods and is robust to the perturbations of the topometric map. The code of the proposed method is publicly available at \url{https://github.com/Jiaolong/trajectory-prediction}.

[174]  arXiv:2105.03874 [pdf, other]
Title: Sparse power methods for large-scale higher-order PageRank problems
Authors: Jun Huang, Gang Wu
Subjects: Numerical Analysis (math.NA)

A commonly used technique for the higher-order PageRank problem is the power method that is computationally intractable for large-scale problems. The truncated power method proposed recently provides us with another idea to solve this problem, however, its accuracy and efficiency can be poor in practical computations. In this work, we revisit the higher-order PageRank problem and consider how to solve it efficiently. The contribution of this work is as follows. First, we accelerate the truncated power method for high-order PageRank. In the improved version, it is neither to form and store the vectors arising from the dangling states, nor to store an auxiliary matrix. Second, we propose a truncated power method with partial updating to further release the overhead, in which one only needs to update some important columns of the approximation in each iteration. On the other hand, the truncated power method solves a modified high-order PageRank model for convenience, which is not mathematically equivalent to the original one. Thus, the third contribution of this work is to propose a sparse power method with partial updating for the original higher-order PageRank problem. The convergence of all the proposed methods are discussed. Numerical experiments on large and sparse real-world and synthetic data sets are performed. The numerical results show the superiority of our new algorithms over some state-of-the-art ones for large and sparse higher-order PageRank problems.

[175]  arXiv:2105.03875 [pdf, ps, other]
Title: Bounding Information Leakage in Machine Learning
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Machine Learning services are being deployed in a large range of applications that make it easy for an adversary, using the algorithm and/or the model, to gain access to sensitive data. This paper investigates fundamental bounds on information leakage. First, we identify and bound the success rate of the worst-case membership inference attack, connecting it to the generalization error of the target model. Second, we study the question of how much sensitive information is stored by the algorithm about the training set and we derive bounds on the mutual information between the sensitive attributes and model parameters. Although our contributions are mostly of theoretical nature, the bounds and involved concepts are of practical relevance. Inspired by our theoretical analysis, we study linear regression and DNN models to illustrate how these bounds can be used to assess the privacy guarantees of ML models.

[176]  arXiv:2105.03876 [pdf, other]
Title: Selective Probabilistic Classifier Based on Hypothesis Testing
Comments: Accepted in EUVIP 2021 conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper, we propose a simple yet effective method to deal with the violation of the Closed-World Assumption for a classifier. Previous works tend to apply a threshold either on the classification scores or the loss function to reject the inputs that violate the assumption. However, these methods cannot achieve the low False Positive Ratio (FPR) required in safety applications. The proposed method is a rejection option based on hypothesis testing with probabilistic networks. With probabilistic networks, it is possible to estimate the distribution of outcomes instead of a single output. By utilizing Z-test over the mean and standard deviation for each class, the proposed method can estimate the statistical significance of the network certainty and reject uncertain outputs. The proposed method was experimented on with different configurations of the COCO and CIFAR datasets. The performance of the proposed method is compared with the Softmax Response, which is a known top-performing method. It is shown that the proposed method can achieve a broader range of operation and cover a lower FPR than the alternative.

[177]  arXiv:2105.03877 [pdf]
Title: Non-iterative Optimization Algorithm for Active Distribution Grids Considering Uncertainty of Feeder Parameters
Authors: J. Wu, M. Liu, W. Lu, K. Xie, M. Xie
Comments: 9 pages, 10 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Systems and Control (eess.SY)

To cope with fast-fluctuating distributed energy resources (DERs) and uncontrolled loads, this paper formulates a time-varying optimization problem for distribution grids with DERs and develops a novel non-iterative algorithm to track the optimal solutions. Different from existing methods, the proposed approach does not require iterations during the sampling interval. It only needs to perform a single one-step calculation at each interval to obtain the evolution of the optimal trajectory, which demonstrates fast calculation and online-tracking capability with an asymptotically vanishing error. Specifically, the designed approach contains two terms: a prediction term tracking the change in the optimal solution based on the time-varying nature of system power, and a correction term pushing the solution toward the optimum based on Newton's method. Moreover, the proposed algorithm can be applied in the absence of an accurate network model by leveraging voltage measurements to identify the true voltage sensitivity parameters. Simulations for an illustrative distribution network are provided to validate the approach.

[178]  arXiv:2105.03879 [pdf, other]
Title: Directional Convergence Analysis under Spherically Symmetric Distribution
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider the fundamental problem of learning linear predictors (i.e., separable datasets with zero margin) using neural networks with gradient flow or gradient descent. Under the assumption of spherically symmetric data distribution, we show directional convergence guarantees with exact convergence rate for two-layer non-linear networks with only two hidden nodes, and (deep) linear networks. Moreover, our discovery is built on dynamic from the initialization without both initial loss and perfect classification constraint in contrast to previous works. We also point out and study the challenges in further strengthening and generalizing our results.

[179]  arXiv:2105.03883 [pdf, other]
Title: Perturbative expansion of the fundamental equation of online user dynamics for describing changes in eigenfrequencies
Comments: 16 pages, 16 figures, submitted to IEEE Access
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

The oscillation model has been proposed as a theoretical framework for describing user dynamics in online social networks. This model can model the user dynamics generated by a particular network structure and allow its causal relationships to be explicitly described. In this paper, by applying perturbation theory to the fundamental equation of the oscillation model, we confirm that we can explicitly trace, at least in principle, the changes in user dynamics associated with changes in the network structure. Specifically, we formulate perturbative expansions up to infinite order, by drawing on inferences from regularities found in perturbative expansions; the accuracy of perturbative expansions of finite order is evaluated by numerical experiments.

[180]  arXiv:2105.03887 [pdf, other]
Title: Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents
Subjects: Computation and Language (cs.CL)

Legal artificial intelligence (LegalAI) aims to benefit legal systems with the technology of artificial intelligence, especially natural language processing (NLP). Recently, inspired by the success of pre-trained language models (PLMs) in the generic domain, many LegalAI researchers devote their effort to apply PLMs to legal tasks. However, utilizing PLMs to address legal tasks is still challenging, as the legal documents usually consist of thousands of tokens, which is far longer than the length that mainstream PLMs can process. In this paper, we release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding. We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering. The experimental results demonstrate that our model can achieve promising improvement on tasks with long documents as inputs.

[181]  arXiv:2105.03889 [pdf, other]
Title: Conformer: Local Features Coupling Global Representations for Visual Recognition
Comments: submitted to iccv2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within visual transformer, the cascaded self-attention modules can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Conformer roots in the Feature Coupling Unit (FCU), which fuses local features and global representations under different resolutions in an interactive fashion. Conformer adopts a concurrent structure so that local features and global representations are retained to the maximum extent. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet. On MSCOCO, it outperforms ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance segmentation, respectively, demonstrating the great potential to be a general backbone network. Code is available at https://github.com/pengzhiliang/Conformer.

[182]  arXiv:2105.03891 [pdf, other]
Title: Interaction Detection Between Vehicles and Vulnerable Road Users: A Deep Generative Approach with Attention
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Intersections where vehicles are permitted to turn and interact with vulnerable road users (VRUs) like pedestrians and cyclists are among some of the most challenging locations for automated and accurate recognition of road users' behavior. In this paper, we propose a deep conditional generative model for interaction detection at such locations. It aims to automatically analyze massive video data about the continuity of road users' behavior. This task is essential for many intelligent transportation systems such as traffic safety control and self-driving cars that depend on the understanding of road users' locomotion. A Conditional Variational Auto-Encoder based model with Gaussian latent variables is trained to encode road users' behavior and perform probabilistic and diverse predictions of interactions. The model takes as input the information of road users' type, position and motion automatically extracted by a deep learning object detector and optical flow from videos, and generates frame-wise probabilities that represent the dynamics of interactions between a turning vehicle and any VRUs involved. The model's efficacy was validated by testing on real--world datasets acquired from two different intersections. It achieved an F1-score above 0.96 at a right--turn intersection in Germany and 0.89 at a left--turn intersection in Japan, both with very busy traffic flows.

[183]  arXiv:2105.03897 [pdf, other]
Title: Binarized Weight Error Networks With a Transition Regularization Term
Comments: Submitted to ICIP 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel binarized weight network (BT) for a resource-efficient neural structure. The proposed model estimates a binary representation of weights by taking into account the approximation error with an additional term. This model increases representation capacity and stability, particularly for shallow networks, while the computation load is theoretically reduced. In addition, a novel regularization term is introduced that is suitable for all threshold-based binary precision networks. This term penalizes the trainable parameters that are far from the thresholds at which binary transitions occur. This step promotes a swift modification for binary-precision responses at train time. The experimental results are carried out for two sets of tasks: visual classification and visual inverse problems. Benchmarks for Cifar10, SVHN, Fashion, ImageNet2012, Set5, Set14, Urban and BSD100 datasets show that our method outperforms all counterparts with binary precision.

[184]  arXiv:2105.03901 [pdf, ps, other]
Title: Feedback Gains for Gaussian Massive Multiple-Access Channels
Authors: Gerhard Kramer
Comments: Submitted to the 2021 IEEE Information Theory Workshop
Subjects: Information Theory (cs.IT)

Feedback is shown to increase the sum-rate capacity of K-user Gaussian multiple-access channels by at most a factor of approximately 1.54, improving Thomas' doubling bound (1987). The new bound is the best possible in the sense that it can be approached as closely as desired for a massive number of users. Moreover, feedback provides unbounded power gain in K for a fixed transmit power per user.

[185]  arXiv:2105.03902 [pdf, other]
Title: Learning Gradient Fields for Molecular Conformation Generation
Comments: ICML 2021, Long talk
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Biomolecules (q-bio.BM)

We study a fundamental problem in computational chemistry known as molecular conformation generation, trying to predict stable 3D structures from 2D molecular graphs. Existing machine learning approaches usually first predict distances between atoms and then generate a 3D structure satisfying the distances, where noise in predicted distances may induce extra errors during 3D coordinate generation. Inspired by the traditional force field methods for molecular dynamics simulation, in this paper, we propose a novel approach called ConfGF by directly estimating the gradient fields of the log density of atomic coordinates. The estimated gradient fields allow directly generating stable conformations via Langevin dynamics. However, the problem is very challenging as the gradient fields are roto-translation equivariant. We notice that estimating the gradient fields of atomic coordinates can be translated to estimating the gradient fields of interatomic distances, and hence develop a novel algorithm based on recent score-based generative models to effectively estimate these gradients. Experimental results across multiple tasks show that ConfGF outperforms previous state-of-the-art baselines by a significant margin.

[186]  arXiv:2105.03906 [pdf, other]
Title: TextAdaIN: Fine-Grained AdaIN for Robust Text Recognition
Comments: 12 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Leveraging the characteristics of convolutional layers, image classifiers are extremely effective. However, recent works have exposed that in many cases they immoderately rely on global image statistics that are easy to manipulate while preserving image semantics. In text recognition, we reveal that it is rather the local image statistics which the networks overly depend on. Motivated by this, we suggest an approach to regulate the reliance on local statistics that improves overall text recognition performance.
Our method, termed TextAdaIN, creates local distortions in the feature map which prevent the network from overfitting to the local statistics. It does so by deliberately mismatching fine-grained feature statistics between samples in a mini-batch. Despite TextAdaIN's simplicity, extensive experiments show its effectiveness compared to other, more complicated methods. TextAdaIN achieves state-of-the-art results on standard handwritten text recognition benchmarks. Additionally, it generalizes to multiple architectures and to the domain of scene text recognition. Furthermore, we demonstrate that integrating TextAdaIN improves robustness towards image corruptions.

[187]  arXiv:2105.03907 [pdf, ps, other]
Title: Generative Mechanisms: The mechanisms that implement codes
Authors: David Ellerman
Comments: arXiv admin note: text overlap with arXiv:1410.4501
Subjects: Information Theory (cs.IT)

The purpose of this paper is to abstractly describe the notion of a generative mechanism that implements a code and to provide a number of examples including the DNA-RNA machinery that implements the genetic code, Chomsky's Principles & Parameters model of a child acquiring a specific grammar given `chunks' of linguistic experience (which play the role of the received code), and embryonic development where positional information in the developing embryo plays the role of the received code. A generative mechanism is distinguished from a selectionist mechanism that has heretofore played an important role in biological modeling (e.g., Darwinian evolution and the immune system).

[188]  arXiv:2105.03909 [pdf]
Title: Diagnosable-by-Design Model-Driven Development for IEC 61499 Industrial Cyber-Physical Systems
Comments: Conference paper, 6 pages, 7 figures, 1 table
Journal-ref: Proceedings of the 46th Annual Conference of the IEEE Industrial Electronics Society (IECON2020). IEEE Computer Society Press, pp.2183-2188
Subjects: Software Engineering (cs.SE)

Integrating the design and creation of fault identification and diagnostic capabilities into Model-Driven Development methodologies is one approach to enhancing the resilience of Industrial Cyber-Physical Systems. We present a Fault Diagnostic Engine designed to recognise and diagnose faults in IEC 61499 Function Block Applications. Using diagnostic agents that interact directly with the target application, we demonstrate fault monitoring and analysis techniques and as well as failure scenario intervention. By designing and building fault diagnostic resources during early phases of Model-Driven Development, both iterative testing and long-term fault management capabilities can be created. While applying and refining appropriate model artifacts, we demonstrate that the concurrent development of function blocks alongside fault management capabilities is both feasible and worthwhile.

[189]  arXiv:2105.03917 [pdf, other]
Title: Combining Time-Dependent Force Perturbations in Robot-Assisted Surgery Training
Subjects: Robotics (cs.RO)

Teleoperated robot-assisted minimally-invasive surgery (RAMIS) offers many advantages over open surgery. However, there are still no guidelines for training skills in RAMIS. Motor learning theories have the potential to improve the design of RAMIS training but they are based on simple movements that do not resemble the complex movements required in surgery. To fill this gap, we designed an experiment to investigate the effect of time-dependent force perturbations on the learning of a pattern-cutting surgical task. Thirty participants took part in the experiment: (1) a control group that trained without perturbations, and (2) a 1Hz group that trained with 1Hz periodic force perturbations that pushed each participant's hand inwards and outwards in the radial direction. We monitored their learning using four objective metrics and found that participants in the 1Hz group learned how to overcome the perturbations and improved their performances during training without impairing their performances after the perturbations were removed. Our results present an important step toward understanding the effect of adding perturbations to RAMIS training protocols and improving RAMIS training for the benefit of surgeons and patients.

[190]  arXiv:2105.03918 [pdf, other]
Title: Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics
Comments: Proceedings of the 38 th International Conference on Machine Learning, 2021
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

Democratization of machine learning requires architectures that automatically adapt to new problems. Neural Differential Equations (NDEs) have emerged as a popular modeling framework by removing the need for ML practitioners to choose the number of layers in a recurrent model. While we can control the computational cost by choosing the number of layers in standard architectures, in NDEs the number of neural network evaluations for a forward pass can depend on the number of steps of the adaptive ODE solver. But, can we force the NDE to learn the version with the least steps while not increasing the training cost? Current strategies to overcome slow prediction require high order automatic differentiation, leading to significantly higher training time. We describe a novel regularization method that uses the internal cost heuristics of adaptive differential equation solvers combined with discrete adjoint sensitivities to guide the training process towards learning NDEs that are easier to solve. This approach opens up the blackbox numerical analysis behind the differential equation solver's algorithm and directly uses its local error estimates and stiffness heuristics as cheap and accurate cost estimates. We incorporate our method without any change in the underlying NDE framework and show that our method extends beyond Ordinary Differential Equations to accommodate Neural Stochastic Differential Equations. We demonstrate how our approach can halve the prediction time and, unlike other methods which can increase the training time by an order of magnitude, we demonstrate similar reduction in training times. Together this showcases how the knowledge embedded within state-of-the-art equation solvers can be used to enhance machine learning.

[191]  arXiv:2105.03923 [pdf, other]
Title: CASA-B: A Unified Framework of Model-Free Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Building on the breakthrough of reinforcement learning, this paper introduces a unified framework of model-free reinforcement learning, CASA-B, Critic AS an Actor with Bandits Vote Algorithm. CASA-B is an actor-critic framework that estimates state-value, state-action-value and policy. An expectation-correct Doubly Robust Trace is introduced to learn state-value and state-action-value, whose convergence properties are guaranteed. We prove that CASA-B integrates a consistent path for the policy evaluation and the policy improvement. The policy evaluation is equivalent to a compensational policy improvement, which alleviates the function approximation error, and is also equivalent to an entropy-regularized policy improvement, which prevents the policy from collapsing to a suboptimal solution. Building on this design, we find the entropy of the behavior policies' and the target policy's are disentangled. Based on this observation, we propose a progressive closed-form entropy control mechanism, which explicitly controls the behavior policies' entropy to arbitrary range. Our experiments show that CASAB is super sample efficient and achieves State-Of-The-Art on Arcade Learning Environment. Our mean Human Normalized Score is 6456.63% and our median Human Normalized Score is 477.17%, under 200M training scale.

[192]  arXiv:2105.03925 [pdf, other]
Title: On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations
Subjects: Information Theory (cs.IT); Probability (math.PR)

Based on the canonical correlation analysis we derive series representations of the probability density function (PDF) and the cumulative distribution function (CDF) of the information density of arbitrary Gaussian random vectors. Using the series representations we give closed-form expressions of the PDF and CDF for important special cases and derive tight approximations for the general case. Furthermore, we discuss the (in)validity of Gaussian approximations of the information density.

[193]  arXiv:2105.03928 [pdf, other]
Title: Which transformer architecture fits my data? A vocabulary bottleneck in self-attention
Comments: ICML 2021
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

After their successful debut in natural language processing, Transformer architectures are now becoming the de-facto standard in many domains. An obstacle for their deployment over new modalities is the architectural configuration: the optimal depth-to-width ratio has been shown to dramatically vary across data types (e.g., $10$x larger over images than over language). We theoretically predict the existence of an embedding rank bottleneck that limits the contribution of self-attention width to the Transformer expressivity. We thus directly tie the input vocabulary size and rank to the optimal depth-to-width ratio, since a small vocabulary size or rank dictates an added advantage of depth over width. We empirically demonstrate the existence of this bottleneck and its implications on the depth-to-width interplay of Transformer architectures, linking the architecture variability across domains to the often glossed-over usage of different vocabulary sizes or embedding ranks in different domains. As an additional benefit, our rank bottlenecking framework allows us to identify size redundancies of $25\%-50\%$ in leading NLP models such as ALBERT and T5.

[194]  arXiv:2105.03930 [pdf, other]
Title: Arbitrary high-order linear structure-preserving schemes for the regularized long-wave equation
Comments: 22 pages, 39 figures
Subjects: Numerical Analysis (math.NA)

In this paper, a class of arbitrarily high-order linear momentum-preserving and energy-preserving schemes are proposed, respectively, for solving the regularized long-wave equation. For the momentum-preserving scheme, our key ideas mainly follow the extrapolation/prediction-correction technique and symplectic Runge-Kutta (RK) methods in time combined with the standard Fourier pseudo-spectral method in space. We show that it is uniquely solvable, unconditionally stable and can exactly preserve the momentum of the system. Subsequently, based on the energy quadratization approach and the analogous linearized idea used in the construction of the linear momentum-preserving scheme, the energy-preserving scheme is presented and it is proven to preserve both the discrete mass and quadratic energy. Numerical results are addressed to demonstrate the accuracy and efficiency of the schemes.

[195]  arXiv:2105.03931 [pdf, other]
Title: Automated Decision-based Adversarial Attacks
Comments: 16 pages, 6 figures
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Deep learning models are vulnerable to adversarial examples, which can fool a target classifier by imposing imperceptible perturbations onto natural examples. In this work, we consider the practical and challenging decision-based black-box adversarial setting, where the attacker can only acquire the final classification labels by querying the target model without access to the model's details. Under this setting, existing works often rely on heuristics and exhibit unsatisfactory performance. To better understand the rationality of these heuristics and the limitations of existing methods, we propose to automatically discover decision-based adversarial attack algorithms. In our approach, we construct a search space using basic mathematical operations as building blocks and develop a random search algorithm to efficiently explore this space by incorporating several pruning techniques and intuitive priors inspired by program synthesis works. Although we use a small and fast model to efficiently evaluate attack algorithms during the search, extensive experiments demonstrate that the discovered algorithms are simple yet query-efficient when transferred to larger normal and defensive models on the CIFAR-10 and ImageNet datasets. They achieve comparable or better performance than the state-of-the-art decision-based attack methods consistently.

[196]  arXiv:2105.03933 [pdf, other]
Title: Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index
Comments: 4 pages, 4 figures
Subjects: Information Retrieval (cs.IR)

Embedding index that enables fast approximate nearest neighbor(ANN) search, serves as an indispensable component for state-of-the-art deep retrieval systems. Traditional approaches, often separating the two steps of embedding learning and index building, incur additional indexing time and decayed retrieval accuracy. In this paper, we propose a novel method called Poeem, which stands for product quantization based embedding index jointly trained with deep retrieval model, to unify the two separate steps within an end-to-end training, by utilizing a few techniques including the gradient straight-through estimator, warm start strategy, optimal space decomposition and Givens rotation. Extensive experimental results show that the proposed method not only improves retrieval accuracy significantly but also reduces the indexing time to almost none. We have open sourced our approach for the sake of comparison and reproducibility.

[197]  arXiv:2105.03934 [pdf, other]
Title: Fish Disease Detection Using Image Based Machine Learning Technique in Aquaculture
Comments: 15 pages, 10 figures, 7 tables. Accepted Manuscript. Journal of King Saud University - Computer and Information Sciences
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Fish diseases in aquaculture constitute a significant hazard to nutriment security. Identification of infected fishes in aquaculture remains challenging to find out at the early stage due to the dearth of necessary infrastructure. The identification of infected fish timely is an obligatory step to thwart from spreading disease. In this work, we want to find out the salmon fish disease in aquaculture, as salmon aquaculture is the fastest-growing food production system globally, accounting for 70 percent (2.5 million tons) of the market. In the alliance of flawless image processing and machine learning mechanism, we identify the infected fishes caused by the various pathogen. This work divides into two portions. In the rudimentary portion, image pre-processing and segmentation have been applied to reduce noise and exaggerate the image, respectively. In the second portion, we extract the involved features to classify the diseases with the help of the Support Vector Machine (SVM) algorithm of machine learning with a kernel function. The processed images of the first portion have passed through this (SVM) model. Then we harmonize a comprehensive experiment with the proposed combination of techniques on the salmon fish image dataset used to examine the fish disease. We have conveyed this work on a novel dataset compromising with and without image augmentation. The results have bought a judgment of our applied SVM performs notably with 91.42 and 94.12 percent of accuracy, respectively, with and without augmentation.

[198]  arXiv:2105.03938 [pdf, other]
Title: Passage Retrieval for Outside-Knowledge Visual Question Answering
Comments: Accepted to SIGIR'21 as a short paper
Subjects: Information Retrieval (cs.IR)

In this work, we address multi-modal information needs that contain text questions and images by focusing on passage retrieval for outside-knowledge visual question answering. This task requires access to outside knowledge, which in our case we define to be a large unstructured passage collection. We first conduct sparse retrieval with BM25 and study expanding the question with object names and image captions. We verify that visual clues play an important role and captions tend to be more informative than object names in sparse retrieval. We then construct a dual-encoder dense retriever, with the query encoder being LXMERT, a multi-modal pre-trained transformer. We further show that dense retrieval significantly outperforms sparse retrieval that uses object expansion. Moreover, dense retrieval matches the performance of sparse retrieval that leverages human-generated captions.

[199]  arXiv:2105.03941 [pdf, other]
Title: Stronger Privacy for Federated Collaborative Filtering with Implicit Feedback
Comments: 9 pages, 5 figures
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Multiagent Systems (cs.MA)

Recommender systems are commonly trained on centrally collected user interaction data like views or clicks. This practice however raises serious privacy concerns regarding the recommender's collection and handling of potentially sensitive data. Several privacy-aware recommender systems have been proposed in recent literature, but comparatively little attention has been given to systems at the intersection of implicit feedback and privacy. To address this shortcoming, we propose a practical federated recommender system for implicit data under user-level local differential privacy (LDP). The privacy-utility trade-off is controlled by parameters $\epsilon$ and $k$, regulating the per-update privacy budget and the number of $\epsilon$-LDP gradient updates sent by each user respectively. To further protect the user's privacy, we introduce a proxy network to reduce the fingerprinting surface by anonymizing and shuffling the reports before forwarding them to the recommender. We empirically demonstrate the effectiveness of our framework on the MovieLens dataset, achieving up to Hit Ratio with K=10 (HR@10) 0.68 on 50k users with 5k items. Even on the full dataset, we show that it is possible to achieve reasonable utility with HR@10>0.5 without compromising user privacy.

[200]  arXiv:2105.03943 [pdf, other]
Title: gComm: An environment for investigating generalization in Grounded Language Acquisition
Comments: Accepted in NAACL 2021 workshop: Visually Grounded Interaction and Language (ViGIL)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

gComm is a step towards developing a robust platform to foster research in grounded language acquisition in a more challenging and realistic setting. It comprises a 2-d grid environment with a set of agents (a stationary speaker and a mobile listener connected via a communication channel) exposed to a continuous array of tasks in a partially observable setting. The key to solving these tasks lies in agents developing linguistic abilities and utilizing them for efficiently exploring the environment. The speaker and listener have access to information provided in different modalities, i.e. the speaker's input is a natural language instruction that contains the target and task specifications and the listener's input is its grid-view. Each must rely on the other to complete the assigned task, however, the only way they can achieve the same, is to develop and use some form of communication. gComm provides several tools for studying different forms of communication and assessing their generalization.

[201]  arXiv:2105.03949 [pdf, other]
Title: High-performance symbolic-numerics via multiple dispatch
Subjects: Computation and Language (cs.CL); Mathematical Software (cs.MS); Symbolic Computation (cs.SC)

As mathematical computing becomes more democratized in high-level languages, high-performance symbolic-numeric systems are necessary for domain scientists and engineers to get the best performance out of their machine without deep knowledge of code optimization. Naturally, users need different term types either to have different algebraic properties for them, or to use efficient data structures. To this end, we developed Symbolics.jl, an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs. In this work we detail an underlying abstract term interface which allows for speed without sacrificing generality. We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system without changing the pre-existing term rewriters. We showcase how this can be used to optimize term construction and give a 113x acceleration on general symbolic transformations. Further, we show that such a generic API allows for complementary term-rewriting implementations. We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers. We showcase an e-graph ruleset which minimizes the number of CPU cycles during expression evaluation, and demonstrate how it simplifies a real-world reaction-network simulation to halve the runtime. Additionally, we show a reaction-diffusion partial differential equation solver which is able to be automatically converted into symbolic expressions via multiple dispatch tracing, which is subsequently accelerated and parallelized to give a 157x simulation speedup. Together, this presents Symbolics.jl as a next-generation symbolic-numeric computing environment geared towards modeling and simulation.

[202]  arXiv:2105.03953 [pdf, other]
Title: Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation
Comments: Accepted in Findings of ACL 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The data scarcity in low-resource languages has become a bottleneck to building robust neural machine translation systems. Fine-tuning a multilingual pre-trained model (e.g., mBART (Liu et al., 2020)) on the translation task is a good approach for low-resource languages; however, its performance will be greatly limited when there are unseen languages in the translation pairs. In this paper, we present a continual pre-training (CPT) framework on mBART to effectively adapt it to unseen languages. We first construct noisy mixed-language text from the monolingual corpus of the target language in the translation pair to cover both the source and target languages, and then, we continue pre-training mBART to reconstruct the original monolingual text. Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline, as well as other strong baselines, across all tested low-resource translation pairs containing unseen languages. Furthermore, our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training. The code is available at https://github.com/zliucr/cpt-nmt.

[203]  arXiv:2105.03958 [pdf, other]
Title: Preserving Privacy in Human-Motion Affect Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Human motion is a biomarker used extensively in clinical analysis to monitor the progression of neurological diseases and mood disorders. Since perceptions of emotions are also interleaved with body posture and movements, emotion recognition from human gait can be used to quantitatively monitor mood changes that are often related to neurological disorders. Many existing solutions often use shallow machine learning models with raw positional data or manually extracted features to achieve this. However, gait is composed of many highly expressive characteristics that can be used to identify human subjects, and most solutions fail to address this, disregarding the subject's privacy. This work evaluates the effectiveness of existing methods at recognising emotions using both 3D temporal joint signals and manually extracted features. We also show that this data can easily be exploited to expose a subject's identity. Therefore to this end, we propose a cross-subject transfer learning technique for training a multi-encoder autoencoder deep neural network to learn disentangled latent representations of human motion features. By disentangling subject biometrics from the gait data, we show that the subjects privacy is preserved while the affect recognition performance outperforms traditional methods.

[204]  arXiv:2105.03962 [pdf, other]
Title: Stochastic Multi-Armed Bandits with Control Variates
Comments: 26 pages, 9 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

This paper studies a new variant of the stochastic multi-armed bandits problem, where the learner has access to auxiliary information about the arms. The auxiliary information is correlated with the arm rewards, which we treat as control variates. In many applications, the arm rewards are a function of some exogenous values, whose mean value is known a priori from historical data and hence can be used as control variates. We use the control variates to obtain mean estimates with smaller variance and tighter confidence bounds. We then develop an algorithm named UCB-CV that uses improved estimates. We characterize the regret bounds in terms of the correlation between the rewards and control variates. The experiments on synthetic data validate the performance guarantees of our proposed algorithm.

[205]  arXiv:2105.03966 [pdf, other]
Title: Unit Ball Model for Hierarchical Embeddings in Complex Hyperbolic Space
Subjects: Machine Learning (cs.LG)

Learning the representation of data with hierarchical structures in the hyperbolic space attracts increasing attention in recent years. Due to the constant negative curvature, the hyperbolic space resembles tree metrics and captures the tree-like properties of hierarchical graphs naturally, which enables the hyperbolic embeddings to improve over traditional Euclidean models. However, most graph data, even the data with hierarchical structures are not trees and they usually do not ubiquitously match the constant curvature property of the hyperbolic space. To address this limitation of hyperbolic embeddings, we explore the complex hyperbolic space, which has the variable negative curvature, for representation learning. Specifically, we propose to learn the graph embeddings in the unit ball model of the complex hyperbolic space. The unit ball model based embeddings have a more powerful representation capacity to capture a variety of hierarchical graph structures. Through experiments on synthetic and real-world data, we show that our approach improves over the hyperbolic embedding models significantly.

[206]  arXiv:2105.03968 [pdf, ps, other]
Title: Fast $n$-fold Boolean Convolution via Additive Combinatorics
Comments: ICALP 2021, 17 pages
Subjects: Data Structures and Algorithms (cs.DS)

We consider the problem of computing the Boolean convolution (with wraparound) of $n$~vectors of dimension $m$, or, equivalently, the problem of computing the sumset $A_1+A_2+\ldots+A_n$ for $A_1,\ldots,A_n \subseteq \mathbb{Z}_m$. Boolean convolution formalizes the frequent task of combining two subproblems, where the whole problem has a solution of size $k$ if for some $i$ the first subproblem has a solution of size~$i$ and the second subproblem has a solution of size $k-i$. Our problem formalizes a natural generalization, namely combining solutions of $n$ subproblems subject to a modular constraint. This simultaneously generalises Modular Subset Sum and Boolean Convolution (Sumset Computation). Although nearly optimal algorithms are known for special cases of this problem, not even tiny improvements are known for the general case.
We almost resolve the computational complexity of this problem, shaving essentially a factor of $n$ from the running time of previous algorithms. Specifically, we present a \emph{deterministic} algorithm running in \emph{almost} linear time with respect to the input plus output size $k$. We also present a \emph{Las Vegas} algorithm running in \emph{nearly} linear expected time with respect to the input plus output size $k$. Previously, no deterministic or randomized $o(nk)$ algorithm was known.
At the heart of our approach lies a careful usage of Kneser's theorem from Additive Combinatorics, and a new deterministic almost linear output-sensitive algorithm for non-negative sparse convolution. In total, our work builds a solid toolbox that could be of independent interest.

[207]  arXiv:2105.03973 [pdf, other]
Title: Perturbation-based Frequency Domain Linear and Nonlinear Noise Estimation
Comments: 7 Pages
Subjects: Systems and Control (eess.SY)

In this paper, a new method for the separation of noise categories based on Four-Wave Mixing is presented.
The theoretical analysis is grounded in the Gaussian Noise model and verified by split step simulations. The noise categories react differently to the introduced perturbations, by performing a set of perturbations the behaviour of the different categories can be separated by means of a least-square fitting. Given ASE is independent of the induced perturbations, it is possible to separate noise contributions. The analysis includes constant and variable power perturbations.
The estimation of the noise categories is discussed from two points of view: NSR evolution post-DSP processing, and over the power spectral density in a notched region. The NSR estimation can only be performed at reception, whereas the power spectral density approach can be performed along the optical link if a high resolution Optical Spectrum Analyzer is available.
Additionally, we perform a simple experimental verification considering of two WaveLogic 3 transceivers for the NSR, successfully estimating the noise contributions.

[208]  arXiv:2105.03979 [pdf, other]
Title: Improving Patent Mining and Relevance Classification using Transformers
Comments: 6th National Conference on Practical Applications of Artificial Intelligence, 2021, Bordeaux, France
Subjects: Computation and Language (cs.CL)

Patent analysis and mining are time-consuming and costly processes for companies, but nevertheless essential if they are willing to remain competitive. To face the overload induced by numerous patents, the idea is to automatically filter them, bringing only few to read to experts. This paper reports a successful application of fine-tuning and retraining on pre-trained deep Natural Language Processing models on patent classification. The solution that we propose combines several state-of-the-art treatments to achieve our goal - decrease the workload while preserving recall and precision metrics.

[209]  arXiv:2105.03983 [pdf, other]
Title: Understanding the Role of Affect Dimensions in Detecting Emotions from Tweets: A Multi-task Approach
Comments: 5 pages, Short Paper accepted at SIGIR 2021
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

We propose VADEC, a multi-task framework that exploits the correlation between the categorical and dimensional models of emotion representation for better subjectivity analysis. Focusing primarily on the effective detection of emotions from tweets, we jointly train multi-label emotion classification and multi-dimensional emotion regression, thereby utilizing the inter-relatedness between the tasks. Co-training especially helps in improving the performance of the classification task as we outperform the strongest baselines with 3.4%, 11%, and 3.9% gains in Jaccard Accuracy, Macro-F1, and Micro-F1 scores respectively on the AIT dataset. We also achieve state-of-the-art results with 11.3% gains averaged over six different metrics on the SenWave dataset. For the regression task, VADEC, when trained with SenWave, achieves 7.6% and 16.5% gains in Pearson Correlation scores over the current state-of-the-art on the EMOBANK dataset for the Valence (V) and Dominance (D) affect dimensions respectively. We conclude our work with a case study on COVID-19 tweets posted by Indians that further helps in establishing the efficacy of our proposed solution.

[210]  arXiv:2105.03986 [pdf, other]
Title: Advising Agent for Service-Providing Live-Chat Operators
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Call centers, in which human operators attend clients using textual chat, are very common in modern e-commerce. Training enough skilled operators who are able to provide good service is a challenge. We suggest an algorithm and a method to train and implement an assisting agent that provides on-line advice to operators while they attend clients. The agent is domain-independent and can be introduced to new domains without major efforts in design, training and organizing structured knowledge of the professional discipline. We demonstrate the applicability of the system in an experiment that realizes its full life-cycle on a specific domain and analyze its capabilities.

[211]  arXiv:2105.03994 [pdf, other]
Title: Dispatcher: A Message-Passing Approach To Language Modelling
Authors: Alberto Cetoli
Subjects: Computation and Language (cs.CL)

This paper proposes a message-passing mechanism to address language modelling. A new layer type is introduced that aims to substitute self-attention. The system is shown to be competitive with existing methods: Given N tokens, the computational complexity is O(N log N) and the memory complexity is O(N) under reasonable assumptions. In the end, the Dispatcher layer is seen to achieve comparable perplexity to prior results while being more efficient

[212]  arXiv:2105.04003 [pdf, other]
Title: Efficiency-driven Hardware Optimization for Adversarially Robust Neural Networks
Comments: 6 pages, 8 figures, 3 tables; Accepted in DATE 2021 conference. arXiv admin note: text overlap with arXiv:2008.11298
Journal-ref: 2021 Design, Automation and Test in Europe (DATE) Conference
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)

With a growing need to enable intelligence in embedded devices in the Internet of Things (IoT) era, secure hardware implementation of Deep Neural Networks (DNNs) has become imperative. We will focus on how to address adversarial robustness for DNNs through efficiency-driven hardware optimizations. Since memory (specifically, dot-product operations) is a key energy-spending component for DNNs, hardware approaches in the past have focused on optimizing the memory. One such approach is approximate digital CMOS memories with hybrid 6T-8T SRAM cells that enable supply voltage (Vdd) scaling yielding low-power operation, without significantly affecting the performance due to read/write failures incurred in the 6T cells. In this paper, we show how the bit-errors in the 6T cells of hybrid 6T-8T memories minimize the adversarial perturbations in a DNN. Essentially, we find that for different configurations of 8T-6T ratios and scaledVdd operation, noise incurred in the hybrid memory architectures is bound within specific limits. This hardware noise can potentially interfere in the creation of adversarial attacks in DNNs yielding robustness. Another memory optimization approach involves using analog memristive crossbars that perform Matrix-Vector-Multiplications (MVMs) efficiently with low energy and area requirements. However, crossbars generally suffer from intrinsic non-idealities that cause errors in performing MVMs, leading to degradation in the accuracy of the DNNs. We will show how the intrinsic hardware variations manifested through crossbar non-idealities yield adversarial robustness to the mapped DNNs without any additional optimization.

[213]  arXiv:2105.04005 [pdf, ps, other]
Title: Delay-Tolerant Constrained OCO with Application to Network Resource Allocation
Comments: 10 pages, 3 figures
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)

We consider online convex optimization (OCO) with multi-slot feedback delay, where an agent makes a sequence of online decisions to minimize the accumulation of time-varying convex loss functions, subject to short-term and long-term constraints that are possibly time-varying. The current convex loss function and the long-term constraint function are revealed to the agent only after the decision is made, and they may be delayed for multiple time slots. Existing work on OCO under this general setting has focused on the static regret, which measures the gap of losses between the online decision sequence and an offline benchmark that is fixed over time. In this work, we consider both the static regret and the more practically meaningful dynamic regret, where the benchmark is a time-varying sequence of per-slot optimizers. We propose an efficient algorithm, termed Delay-Tolerant Constrained-OCO (DTC-OCO), which uses a novel constraint penalty with double regularization to tackle the asynchrony between information feedback and decision updates. We derive upper bounds on its dynamic regret, static regret, and constraint violation, proving them to be sublinear under mild conditions. We further apply DTC-OCO to a general network resource allocation problem, which arises in many systems such as data networks and cloud computing. Simulation results demonstrate substantial performance gain of DTC-OCO over the known best alternative.

[214]  arXiv:2105.04009 [pdf, other]
Title: RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification
Subjects: Machine Learning (cs.LG)

Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving classification performance on imbalanced binary data. However, the state-of-the-art methods ignore the local joint distribution of the data or correct it as a post-processing step. This can causes sub-optimal shifts in the training distribution, particularly when the target data distribution is complex. In this paper, we propose Radial-Based Combined Cleaning and Resampling (RB-CCR). RB-CCR utilizes the concept of class potential to refine the energy-based resampling approach of CCR. In particular, RB-CCR exploits the class potential to accurately locate sub-regions of the data-space for synthetic oversampling. The category sub-region for oversampling can be specified as an input parameter to meet domain-specific needs or be automatically selected via cross-validation. Our $5\times2$ cross-validated results on 57 benchmark binary datasets with 9 classifiers show that RB-CCR achieves a better precision-recall trade-off than CCR and generally out-performs the state-of-the-art resampling methods in terms of AUC and G-mean.

[215]  arXiv:2105.04015 [pdf]
Title: Discomfort: a New Material for Interaction Design
Comments: 8 pages of text + 2 pages refs, 36 references. Accepted paper, 4th Body as Starting Point Workshop, part of ACM CHI2021 conference (this https URL)
Subjects: Human-Computer Interaction (cs.HC)

This paper proposes discomfort as a new material for HCI researchers and designers to consider in any application that helps a person develop a new skill, practice or state. Discomfort is a fundamental precursor of adaptation and adaptation leads to new skill, practice or state. The way in which discomfort is perceived, and when it is experienced, is also often part of a rationale for rejecting or adopting a practice. Engaging effectively with discomfort may lead to increased personal development. We propose incorporating discomfort-as-material into our designs explicitly as a mechanism to make desired adaptations available to more of us, more effectively and more of the time. To explore this possibility, we offer an overview of the physiology and neurology of discomfort in adaptation and propose 3 issues related to incorporating discomfort into design: preparation for discomfort, need for recovery, and value of the practice. We look forward in the Workshop to exploring and developing ideas for specific Discomfortable Designs to insource discomfort as part of positive, resilient adaptation.

[216]  arXiv:2105.04017 [pdf, other]
Title: Concurrent infill topology and shape optimisation of lattice-skin structures
Comments: 17 pages, 13 figures
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE)

Lattice-skin structures composed of a thin-shell skin and a lattice infill are widespread in nature and large-scale engineering due to their efficiency and exceptional mechanical properties. Recent advances in additive manufacturing, or 3D printing, make it possible to create lattice-skin structures of almost any size with arbitrary shape and geometric complexity. We propose a novel gradient-based approach to optimising both the shape and infill of lattice-skin structures to improve their efficiency further. The shell is modelled as a Kirchhoff-Love shell and analysed using isogeometric subdivision surfaces, whereas the lattice is modelled as a pin-jointed truss. The lattice consists of many cells, possibly of different sizes, with each containing a small number of struts. We propose a penalisation approach akin to the SIMP (solid isotropic material with penalisation) method for topology optimisation of the lattice. Furthermore, a corresponding sensitivity filter and a lattice extraction technique are introduced to ensure the stability of the optimisation process and to eliminate scattered struts of small cross-sectional areas. The developed topology optimisation technique is suitable for non-periodic, non-uniform lattices. For shape optimisation of both the shell and the lattice, the geometry of the lattice-skin structure is parameterised using the free-form deformation technique. The topology and shape optimisation problems are solved in an iterative, sequential manner. The effectiveness of the proposed approach and the influence of different algorithmic parameters are demonstrated with several numerical examples.

[217]  arXiv:2105.04019 [pdf, other]
Title: Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)

Sorting and ranking supervision is a method for training neural networks end-to-end based on ordering constraints. That is, the ground truth order of sets of samples is known, while their absolute values remain unsupervised. For that, we propose differentiable sorting networks by relaxing their pairwise conditional swap operations. To address the problems of vanishing gradients and extensive blurring that arise with larger numbers of layers, we propose mapping activations to regions with moderate gradients. We consider odd-even as well as bitonic sorting networks, which outperform existing relaxations of the sorting operation. We show that bitonic sorting networks can achieve stable training on large input sets of up to 1024 elements.

[218]  arXiv:2105.04020 [pdf, other]
Title: End-to-End Optical Character Recognition for Bengali Handwritten Words
Comments: Accepted in "The 4th National Computing Colleges Conference"
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Optical character recognition (OCR) is a process of converting analogue documents into digital using document images. Currently, many commercial and non-commercial OCR systems exist for both handwritten and printed copies for different languages. Despite this, very few works are available in case of recognising Bengali words. Among them, most of the works focused on OCR of printed Bengali characters. This paper introduces an end-to-end OCR system for Bengali language. The proposed architecture implements an end to end strategy that recognises handwritten Bengali words from handwritten word images. We experiment with popular convolutional neural network (CNN) architectures, including DenseNet, Xception, NASNet, and MobileNet to build the OCR architecture. Further, we experiment with two different recurrent neural networks (RNN) methods, LSTM and GRU. We evaluate the proposed architecture using BanglaWritting dataset, which is a peer-reviewed Bengali handwritten image dataset. The proposed method achieves 0.091 character error rate and 0.273 word error rate performed using DenseNet121 model with GRU recurrent layer.

[219]  arXiv:2105.04021 [pdf, other]
Title: MS MARCO: Benchmarking Ranking Models in the Large-Data Regime
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field. However, the goal is not simply to identify which run is "best", achieving the top score. The goal is to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This paper uses the MS MARCO and TREC Deep Learning Track as our case study, comparing it to the case of TREC ad hoc ranking in the 1990s. We show how the design of the evaluation effort can encourage or discourage certain outcomes, and raising questions about internal and external validity of results. We provide some analysis of certain pitfalls, and a statement of best practices for avoiding such pitfalls. We summarize the progress of the effort so far, and describe our desired end state of "robust usefulness", along with steps that might be required to get us there.

[220]  arXiv:2105.04022 [pdf]
Title: Designing a Web Application for Simple and Collaborative Video Annotation That Meets Teaching Routines and Educational Requirements
Comments: 24 pages, 9 figures
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)

Video annotation and analysis is an important activity for teaching with and about audiovisual media artifacts because it helps students to learn how to identify textual and formal connections in media products. But school teachers lack adequate tools for video annotation and analysis in media education that are easy-to-use, integrate into established teaching organization, and support quick collaborative work. To address these challenges, we followed a design-based research approach and conducted qualitative interviews with teachers to develop TRAVIS GO, a web application for simple and collaborative video annotation. TRAVIS GO allows for quick and easy use within established teaching settings. The web application provides basic analytical features in an adaptable work space. Key didactic features include tagging and commenting on posts, sharing and exporting projects, and working in live collaboration. Teachers can create assignments according to grade level, learning subject, and class size. Our work contributes further insights for the CSCW community about how to implement user demands into developing educational tools.

[221]  arXiv:2105.04023 [pdf, other]
Title: Fast and Error-Adaptive Influence Maximization based on Count-Distinct Sketches
Comments: 12 pages. Sent to IEEE Transactions on Knowledge and Data Engineering as a regular paper
Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS)

Influence maximization (IM) is the problem of finding a seed vertex set that maximizes the expected number of vertices influenced under a given diffusion model. Due to the NP-Hardness of finding an optimal seed set, approximation algorithms are frequently used for IM. In this work, we describe a fast, error-adaptive approach that leverages Count-Distinct sketches and hash-based fused sampling. To estimate the number of influenced vertices throughout a diffusion, we use per-vertex Flajolet-Martin sketches where each sketch corresponds to a sampled subgraph. To efficiently simulate the diffusions, the reach-set cardinalities of a single vertex are stored in memory in a consecutive fashion. This allows the proposed algorithm to estimate the number of influenced vertices in a single step for simulations at once. For a faster IM kernel, we rebuild the sketches in parallel only after observing estimation errors above a given threshold. Our experimental results show that the proposed algorithm yields high-quality seed sets while being up to 119x faster than a state-of-the-art approximation algorithm. In addition, it is up to 62x faster than a sketch-based approach while producing seed sets with 3%-12% better influence scores

[222]  arXiv:2105.04024 [pdf, other]
Title: DocSCAN: Unsupervised Text Classification via Learning from Neighbors
Subjects: Computation and Language (cs.CL)

We introduce DocSCAN, a completely unsupervised text classification approach using Semantic Clustering by Adopting Nearest-Neighbors (SCAN). For each document, we obtain semantically informative vectors from a large pre-trained language model. Similar documents have proximate vectors, so neighbors in the representation space tend to share topic labels. Our learnable clustering approach uses pairs of neighboring datapoints as a weak learning signal. The proposed approach learns to assign classes to the whole dataset without provided ground-truth labels. On five topic classification benchmarks, we improve on various unsupervised baselines by a large margin. In datasets with relatively few and balanced outcome classes, DocSCAN approaches the performance of supervised classification. The method fails for other types of classification, such as sentiment analysis, pointing to important conceptual and practical differences between classifying images and texts.

[223]  arXiv:2105.04026 [pdf, other]
Title: The Modern Mathematics of Deep Learning
Comments: This review paper will appear as a book chapter in the book "Theory of Deep Learning" by Cambridge University Press
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail.

[224]  arXiv:2105.04027 [pdf, other]
Title: Improving Multi-agent Coordination by Learning to Estimate Contention
Comments: Accepted to the 30th International Joint Conference on Artificial Intelligence (IJCAI-21)
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)

We present a multi-agent learning algorithm, ALMA-Learning, for efficient and fair allocations in large-scale systems. We circumvent the traditional pitfalls of multi-agent learning (e.g., the moving target problem, the curse of dimensionality, or the need for mutually consistent actions) by relying on the ALMA heuristic as a coordination mechanism for each stage game. ALMA-Learning is decentralized, observes only own action/reward pairs, requires no inter-agent communication, and achieves near-optimal (<5% loss) and fair coordination in a variety of synthetic scenarios and a real-world meeting scheduling problem. The lightweight nature and fast learning constitute ALMA-Learning ideal for on-device deployment.

[225]  arXiv:2105.04030 [pdf, other]
Title: A Bit More Bayesian: Domain-Invariant Learning with Uncertainty
Comments: accepted to ICML 2021
Subjects: Machine Learning (cs.LG)

Domain generalization is challenging due to the domain shift and the uncertainty caused by the inaccessibility of target domain data. In this paper, we address both challenges with a probabilistic framework based on variational Bayesian inference, by incorporating uncertainty into neural network weights. We couple domain invariance in a probabilistic formula with the variational Bayesian inference. This enables us to explore domain-invariant learning in a principled way. Specifically, we derive domain-invariant representations and classifiers, which are jointly established in a two-layer Bayesian neural network. We empirically demonstrate the effectiveness of our proposal on four widely used cross-domain visual recognition benchmarks. Ablation studies validate the synergistic benefits of our Bayesian treatment when jointly learning domain-invariant representations and classifiers for domain generalization. Further, our method consistently delivers state-of-the-art mean accuracy on all benchmarks.

[226]  arXiv:2105.04034 [pdf, other]
Title: NMPC trajectory planner for urban autonomous driving
Comments: 10 pages, 11 figures
Subjects: Robotics (cs.RO)

This paper presents a trajectory planner for autonomous driving based on a Nonlinear Model Predictive Control (NMPC) algorithm that accounts for Pacejka's nonlinear lateral tyre dynamics as well as for zero speed conditions through a novel slip angles calculation. In the NMPC framework, road boundaries and obstacles (both static and moving) are taken into account thanks to soft and hard constraints implementation. The numerical solution of the NMPC problem is carried out using ACADO toolkit coupled with the quadratic programming solver qpOASES. The effectiveness of the proposed NMPC trajectory planner has been tested using CarMaker multibody models. Time analysis results provided by the simulations shown, state that the proposed algorithm can be implemented on the real-time control framework of an autonomous vehicle under the assumption of data coming from an upstream estimation block.

[227]  arXiv:2105.04035 [pdf, other]
Title: Knapsack and Subset Sum with Small Items
Subjects: Data Structures and Algorithms (cs.DS)

Knapsack and Subset Sum are fundamental NP-hard problems in combinatorial optimization. Recently there has been a growing interest in understanding the best possible pseudopolynomial running times for these problems with respect to various parameters.
In this paper we focus on the maximum item size $s$ and the maximum item value $v$. We give algorithms that run in time $O(n + s^3)$ and $O(n + v^3)$ for the Knapsack problem, and in time $\tilde{O}(n + s^{5/3})$ for the Subset Sum problem.
Our algorithms work for the more general problem variants with multiplicities, where each input item comes with a (binary encoded) multiplicity, which succinctly describes how many times the item appears in the instance. In these variants $n$ denotes the (possibly much smaller) number of distinct items.
Our results follow from combining and optimizing several diverse lines of research, notably proximity arguments for integer programming due to Eisenbrand and Weismantel (TALG 2019), fast structured $(\min,+)$-convolution by Kellerer and Pferschy (J. Comb. Optim. 2004), and additive combinatorics methods originating from Galil and Margalit (SICOMP 1991).

[228]  arXiv:2105.04036 [pdf, other]
Title: A Novel Map of Knowledge for Science
Authors: Fan Shen
Subjects: Digital Libraries (cs.DL); History and Philosophy of Physics (physics.hist-ph); Physics and Society (physics.soc-ph)

With the expansion of scientific research, the number of scientific research results is increasing. How to summarize these data has become an urgent problem. Therefore, knowledge mapping methods come into being, providing a lot of management and application functions. However, it is still a problem to fully understand the knowledge map, especially in the field of sociology. In this paper, a three-dimensional knowledge map is proposed with time, space and number based on category and numericity, which concludes all the scientific problems related to numericity interdisciplinary. Compared with the traditional way, this map is normative, and puts forward the general production criteria of labeling and digitization. It is also intuitive and readable, on which nature, society and formal science are expressed in the same picture. Some social subjects are expressed more vividly than traditional text-based expressions, and are compatible with the natural science system. Mathematics also show its importance on the map as formal Science, indicating that it is the key to the development of science. This is not only a preliminary model of a comprehensive scientific worldview, but also a preliminary framework for the connection and cooperation of various disciplines in the future.

[229]  arXiv:2105.04037 [pdf, ps, other]
Title: Graph Attention Networks with Positional Embeddings
Subjects: Machine Learning (cs.LG)

Graph Neural Networks (GNNs) are deep learning methods which provide the current state of the art performance in node classification tasks. GNNs often assume homophily -- neighboring nodes having similar features and labels--, and therefore may not be at their full potential when dealing with non-homophilic graphs. In this work, we focus on addressing this limitation and enable Graph Attention Networks (GAT), a commonly used variant of GNNs, to explore the structural information within each graph locality. Inspired by the positional encoding in the Transformers, we propose a framework, termed Graph Attentional Networks with Positional Embeddings (GAT-POS), to enhance GATs with positional embeddings which capture structural and positional information of the nodes in the graph. In this framework, the positional embeddings are learned by a model predictive of the graph context, plugged into an enhanced GAT architecture, which is able to leverage both the positional and content information of each node. The model is trained jointly to optimize for the task of node classification as well as the task of predicting graph context. Experimental results show that GAT-POS reaches remarkable improvement compared to strong GNN baselines and recent structural embedding enhanced GNNs on non-homophilic graphs.

[230]  arXiv:2105.04040 [pdf, other]
Title: Truly shift-equivariant convolutional neural networks with adaptive polyphase upsampling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)

Convolutional neural networks lack shift equivariance due to the presence of downsampling layers. In image classification, adaptive polyphase downsampling (APS-D) was recently proposed to make CNNs perfectly shift invariant. However, in networks used for image reconstruction tasks, it can not by itself restore shift equivariance. We address this problem by proposing adaptive polyphase upsampling (APS-U), a non-linear extension of conventional upsampling, which allows CNNs to exhibit perfect shift equivariance. With MRI and CT reconstruction experiments, we show that networks containing APS-D/U layers exhibit state of the art equivariance performance without sacrificing on image reconstruction quality. In addition, unlike prior methods like data augmentation and anti-aliasing, the gains in equivariance obtained from APS-D/U also extend to images outside the training distribution.

[231]  arXiv:2105.04041 [pdf, ps, other]
Title: Lyapunov-Krasovskii functionals for some classes of nonlinear time delay systems
Comments: Submitted for presentation in 2021 Conference on Decision and Control (CDC)
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)

In this contribution, we study an homogeneous class of nonlinear time delay systems with time-varying perturbations. Using the Lyapunov-Krasovskii approach, we introduce a functional that leads to perturbation conditions matching those obtained previously in the Razumikhin framework. The functionals are applied to the estimation of the domain of attraction and of the system solutions. An illustrative example is given.

[232]  arXiv:2105.04043 [pdf, other]
Title: Fast stable finite difference schemes for nonlinear cross-diffusion
Authors: Diogo Lobo
Subjects: Numerical Analysis (math.NA)

The dynamics of cross-diffusion models leads to a high computational complexity for implicit difference schemes, turning them unsuitable for tasks when time is of the essence. We propose the use of two operator splitting schemes for nonlinear cross-diffusion processes in order to lower the computational load, and establish their stability properties using discrete $L^2$ energy methods. Furthermore, by attaining a stable factorization of the system matrix as a forward-backward pass, corresponding to the Thomas algorithm for self-diffusion processes, we show that the use of implicit cross-diffusion can be competitive in terms of execution time, widening the range of viable cross-diffusion coefficients for on-the-fly applications.

[233]  arXiv:2105.04045 [pdf, other]
Title: Swarm Differential Privacy for Purpose Driven Data-Information-Knowledge-Wisdom Architecture
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Privacy protection has recently attracted the attention of both academics and industries. Society protects individual data privacy through complex legal frameworks. This has become a topic of interest with the increasing applications of data science and artificial intelligence that have created a higher demand to the ubiquitous application of the data. The privacy protection of the broad Data-InformationKnowledge-Wisdom (DIKW) landscape, the next generation of information organization, has not been in the limelight. Next, we will explore DIKW architecture through the applications of popular swarm intelligence and differential privacy. As differential privacy proved to be an effective data privacy approach, we will look at it from a DIKW domain perspective. Swarm Intelligence could effectively optimize and reduce the number of items in DIKW used in differential privacy, this way accelerating both the effectiveness and the efficiency of differential privacy for crossing multiple modals of conceptual DIKW. The proposed approach is proved through the application of personalized data that is based on the open-sourse IRIS dataset. This experiment demonstrates the efficiency of Swarm Intelligence in reducing computing complexity.

[234]  arXiv:2105.04047 [pdf, other]
Title: Analyzing Online Political Advertisements
Comments: Accepted at ACL Findings 2021
Subjects: Computation and Language (cs.CL)

Online political advertising is a central aspect of modern election campaigning for influencing public opinion. Computational analysis of political ads is of utmost importance in political science to understand characteristics of digital campaigning. It is also important in computational linguistics to study features of political discourse and communication on a large scale. In this work, we present the first computational study on online political ads with the aim to (1) infer the political ideology of an ad sponsor; and (2) identify whether the sponsor is an official political party or a third-party organization. We develop two new large datasets for the two tasks consisting of ads from the U.S.. Evaluation results show that our approach that combines textual and visual information from pre-trained neural models outperforms a state-of-the-art method for generic commercial ad classification. Finally, we provide an in-depth analysis of the limitations of our best performing models and a linguistic analysis to study the characteristics of political ads discourse.

[235]  arXiv:2105.04048 [pdf, ps, other]
Title: Complexity-Adaptive Maximum-Likelihood Decoding of Modified $\boldsymbol{G}_N$-Coset Codes
Comments: Submitted to an IEEE conference
Subjects: Information Theory (cs.IT)

A complexity-adaptive tree search algorithm is proposed for $\boldsymbol{G}_N$-coset codes that implements maximum-likelihood (ML) decoding by using a successive decoding schedule. The average complexity is close to that of the successive cancellation (SC) decoding for practical error rates when applied to polar codes and short Reed-Muller (RM) codes, e.g., block lengths up to $N=128$. By modifying the algorithm to limit the worst-case complexity, one obtains a near-ML decoder for longer RM codes and their subcodes. Unlike other bit-flip decoders, no outer code is needed to terminate decoding. The algorithm can thus be applied to modified $\boldsymbol{G}_N$-coset code constructions with dynamic frozen bits. One advantage over sequential decoders is that there is no need to optimize a separate parameter.

[236]  arXiv:2105.04051 [pdf, other]
Title: Aggregating From Multiple Target-Shifted Sources
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Multi-source domain adaptation aims at leveraging the knowledge from multiple tasks for predicting a related target domain. Hence, a crucial aspect is to properly combine different sources based on their relations. In this paper, we analyzed the problem for aggregating source domains with different label distributions, where most recent source selection approaches fail. Our proposed algorithm differs from previous approaches in two key ways: the model aggregates multiple sources mainly through the similarity of semantic conditional distribution rather than marginal distribution; the model proposes a \emph{unified} framework to select relevant sources for three popular scenarios, i.e., domain adaptation with limited label on target domain, unsupervised domain adaptation and label partial unsupervised domain adaption. We evaluate the proposed method through extensive experiments. The empirical results significantly outperform the baselines.

[237]  arXiv:2105.04054 [pdf, other]
Title: Societal Biases in Language Generation: Progress and Challenges
Comments: ACL 2021
Subjects: Computation and Language (cs.CL)

Technology for language generation has advanced rapidly, spurred by advancements in pre-training large models on massive amounts of data and the need for intelligent agents to communicate in a natural manner. While techniques can effectively generate fluent text, they can also produce undesirable societal biases that can have a disproportionately negative impact on marginalized populations. Language generation presents unique challenges in terms of direct user interaction and the structure of decoding techniques. To better understand these challenges, we present a survey on societal biases in language generation, focusing on how techniques contribute to biases and on progress towards bias analysis and mitigation. Motivated by a lack of studies on biases from decoding techniques, we also conduct experiments to quantify the effects of these techniques. By further discussing general trends and open challenges, we call to attention promising directions for research and the importance of fairness and inclusivity considerations for language generation applications.

[238]  arXiv:2105.04055 [pdf, other]
Title: Scalar auxiliary variable approach for conservative/dissipative partial differential equations with unbounded energy
Subjects: Numerical Analysis (math.NA)

In this paper, we present a novel investigation of the so-called SAV approach, which is a framework to construct linearly implicit geometric numerical integrators for partial differential equations with variational structure. SAV approach was originally proposed for the gradient flows that have lower-bounded nonlinear potentials such as the Allen-Cahn and Cahn-Hilliard equations, and this assumption on the energy was essential. In this paper, we propose a novel approach to address gradient flows with unbounded energy such as the KdV equation by a decomposition of energy functionals. Further, we will show that the equation of the SAV approach, which is a system of equations with scalar auxiliary variables, is expressed as another gradient system that inherits the variational structure of the original system. This expression allows us to construct novel higher-order integrators by a certain class of Runge-Kutta methods. We will propose second and fourth order schemes for conservative systems in our framework and present several numerical examples.

[239]  arXiv:2105.04057 [pdf, other]
Title: Fast Automated Reasoning over String Diagrams using Multiway Causal Structure
Comments: Submitted to Applied Category Theory 2021. 14 pages, 9 figures
Subjects: Logic in Computer Science (cs.LO); Discrete Mathematics (cs.DM)

We introduce an intuitive algorithmic methodology for enacting automated rewriting of string diagrams within a general double-pushout (DPO) framework, in which the sequence of rewrites is chosen in accordance with the causal structure of the underlying diagrammatic calculus. The combination of the rewriting structure and the causal structure may be elegantly formulated as a weak 2-category equipped with both total and partial monoidal bifunctors, thus providing a categorical semantics for the full multiway evolution causal graph of a generic Wolfram model hypergraph rewriting system. As an illustrative example, we show how a special case of this algorithm enables highly efficient automated simplification of quantum circuits, as represented in the ZX-calculus.

[240]  arXiv:2105.04062 [pdf, other]
Title: Approximate Fréchet Mean for Data Sets of Sparse Graphs
Comments: 28 pages
Subjects: Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

To characterize the location (mean, median) of a set of graphs, one needs a notion of centrality that is adapted to metric spaces, since graph sets are not Euclidean spaces. A standard approach is to consider the Fr\'echet mean. In this work, we equip a set of graph with the pseudometric defined by the $\ell_2$ norm between the eigenvalues of their respective adjacency matrix . Unlike the edit distance, this pseudometric reveals structural changes at multiple scales, and is well adapted to studying various statistical problems on sets of graphs. We describe an algorithm to compute an approximation to the Fr\'echet mean of a set of undirected unweighted graphs with a fixed size.

[241]  arXiv:2105.04064 [pdf, other]
Title: Leveraging Structural Information to Improve Point Line Visual-Inertial Odometry
Subjects: Robotics (cs.RO)

Leveraging line features can help to improve the localization accuracy of point-based monocular Visual-Inertial Odometry (VIO) system, as lines provide additional constraints. Moreover, in an artificial environment, some straight lines are parallel to each other. In this paper, we designed a VIO system based on points and straight lines, which divides straight lines into structural straight lines (that is, straight lines parallel to each other) and non-structural straight lines. In addition, unlike the orthogonal representation using four parameters to represent the 3D straight line, we only used two parameters to minimize the representation of the structural straight line and the non-structural straight line. Furthermore, we designed a straight line matching strategy based on sampling points to improve the efficiency and success rate of straight line matching. The effectiveness of our method is verified on both public datasets of EuRoc and TUM VI benchmark and compared with other state-of-the-art algorithms.

[242]  arXiv:2105.04065 [pdf, other]
Title: Voice activity detection in the wild: A data-driven approach using teacher-student training
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1542-1555, 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Voice activity detection is an essential pre-processing component for speech-related tasks such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain frame-level labels from an ASR pipeline by using, e.g., a Hidden Markov model. These ASR models are commonly trained on clean and fully transcribed data, limiting VAD systems to be trained on clean or synthetically noised datasets. Therefore, a major challenge for supervised VAD systems is their generalization towards noisy, real-world data. This work proposes a data-driven teacher-student approach for VAD, which utilizes vast and unconstrained audio data for training. Unlike previous approaches, only weak labels during teacher training are required, enabling the utilization of any real-world, potentially noisy dataset. Our approach firstly trains a teacher model on a source dataset (Audioset) using clip-level supervision. After training, the teacher provides frame-level guidance to a student model on an unlabeled, target dataset. A multitude of student models trained on mid- to large-sized datasets are investigated (Audioset, Voxceleb, NIST SRE). Our approach is then respectively evaluated on clean, artificially noised, and real-world data. We observe significant performance gains in artificially noised and real-world scenarios. Lastly, we compare our approach against other unsupervised and supervised VAD methods, demonstrating our method's superiority.

[243]  arXiv:2105.04066 [pdf, other]
Title: Reconstructive Sequence-Graph Network for Video Summarization
Comments: Accepted by IEEE TPAMI 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Exploiting the inner-shot and inter-shot dependencies is essential for key-shot based video summarization. Current approaches mainly devote to modeling the video as a frame sequence by recurrent neural networks. However, one potential limitation of the sequence models is that they focus on capturing local neighborhood dependencies while the high-order dependencies in long distance are not fully exploited. In general, the frames in each shot record a certain activity and vary smoothly over time, but the multi-hop relationships occur frequently among shots. In this case, both the local and global dependencies are important for understanding the video content. Motivated by this point, we propose a Reconstructive Sequence-Graph Network (RSGN) to encode the frames and shots as sequence and graph hierarchically, where the frame-level dependencies are encoded by Long Short-Term Memory (LSTM), and the shot-level dependencies are captured by the Graph Convolutional Network (GCN). Then, the videos are summarized by exploiting both the local and global dependencies among shots. Besides, a reconstructor is developed to reward the summary generator, so that the generator can be optimized in an unsupervised manner, which can avert the lack of annotated data in video summarization. Furthermore, under the guidance of reconstruction loss, the predicted summary can better preserve the main video content and shot-level dependencies. Practically, the experimental results on three popular datasets i.e., SumMe, TVsum and VTW) have demonstrated the superiority of our proposed approach to the summarization task.

[244]  arXiv:2105.04067 [pdf, other]
Title: Neural Graph Matching based Collaborative Filtering
Comments: 10 pages, 6 figures, 4 tables, SIGIR 2021
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

User and item attributes are essential side-information; their interactions (i.e., their co-occurrence in the sample data) can significantly enhance prediction accuracy in various recommender systems. We identify two different types of attribute interactions, inner interactions and cross interactions: inner interactions are those between only user attributes or those between only item attributes; cross interactions are those between user attributes and item attributes. Existing models do not distinguish these two types of attribute interactions, which may not be the most effective way to exploit the information carried by the interactions. To address this drawback, we propose a neural Graph Matching based Collaborative Filtering model (GMCF), which effectively captures the two types of attribute interactions through modeling and aggregating attribute interactions in a graph matching structure for recommendation. In our model, the two essential recommendation procedures, characteristic learning and preference matching, are explicitly conducted through graph learning (based on inner interactions) and node matching (based on cross interactions), respectively. Experimental results show that our model outperforms state-of-the-art models. Further studies verify the effectiveness of GMCF in improving the accuracy of recommendation.

[245]  arXiv:2105.04070 [pdf, other]
Title: Robust Training Using Natural Transformation
Comments: arXiv admin note: text overlap with arXiv:1912.03192, arXiv:2004.02546 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Previous robustness approaches for deep learning models such as data augmentation techniques via data transformation or adversarial training cannot capture real-world variations that preserve the semantics of the input, such as a change in lighting conditions. To bridge this gap, we present NaTra, an adversarial training scheme that is designed to improve the robustness of image classification algorithms. We target attributes of the input images that are independent of the class identification, and manipulate those attributes to mimic real-world natural transformations (NaTra) of the inputs, which are then used to augment the training dataset of the image classifier. Specifically, we apply \textit{Batch Inverse Encoding and Shifting} to map a batch of given images to corresponding disentangled latent codes of well-trained generative models. \textit{Latent Codes Expansion} is used to boost image reconstruction quality through the incorporation of extended feature maps. \textit{Unsupervised Attribute Directing and Manipulation} enables identification of the latent directions that correspond to specific attribute changes, and then produce interpretable manipulations of those attributes, thereby generating natural transformations to the input data. We demonstrate the efficacy of our scheme by utilizing the disentangled latent representations derived from well-trained GANs to mimic transformations of an image that are similar to real-world natural variations (such as lighting conditions or hairstyle), and train models to be invariant to these natural transformations. Extensive experiments show that our method improves generalization of classification models and increases its robustness to various real-world distortions

[246]  arXiv:2105.04072 [pdf, other]
Title: Meteorological and human mobility data on predicting COVID-19 cases by a novel hybrid decomposition method with anomaly detection analysis: a case study in the capitals of Brazil
Subjects: Machine Learning (cs.LG); Applications (stat.AP)

In 2020, Brazil was the leading country in COVID-19 cases in Latin America, and capital cities were the most severely affected by the outbreak. Climates vary in Brazil due to the territorial extension of the country, its relief, geography, and other factors. Since the most common COVID-19 symptoms are related to the respiratory system, many researchers have studied the correlation between the number of COVID-19 cases with meteorological variables like temperature, humidity, rainfall, etc. Also, due to its high transmission rate, some researchers have analyzed the impact of human mobility on the dynamics of COVID-19 transmission. There is a dearth of literature that considers these two variables when predicting the spread of COVID-19 cases. In this paper, we analyzed the correlation between the number of COVID-19 cases and human mobility, and meteorological data in Brazilian capitals. We found that the correlation between such variables depends on the regions where the cities are located. We employed the variables with a significant correlation with COVID-19 cases to predict the number of COVID-19 infections in all Brazilian capitals and proposed a prediction method combining the Ensemble Empirical Mode Decomposition (EEMD) method with the Autoregressive Integrated Moving Average Exogenous inputs (ARIMAX) method, which we called EEMD-ARIMAX. After analyzing the results poor predictions were further investigated using a signal processing-based anomaly detection method. Computational tests showed that EEMD-ARIMAX achieved a forecast 26.73% better than ARIMAX. Moreover, an improvement of 30.69% in the average root mean squared error (RMSE) was noticed when applying the EEMD-ARIMAX method to the data normalized after the anomaly detection.

[247]  arXiv:2105.04075 [pdf]
Title: CFPNet-M: A Light-Weight Encoder-Decoder Based Network for Multimodal Biomedical Image Real-Time Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Currently, developments of deep learning techniques are providing instrumental to identify, classify, and quantify patterns in medical images. Segmentation is one of the important applications in medical image analysis. In this regard, U-Net is the predominant approach to medical image segmentation tasks. However, we found that those U-Net based models have limitations in several aspects, for example, millions of parameters in the U-Net consuming considerable computation resource and memory, lack of global information, and missing some tough objects. Therefore, we applied two modifications to improve the U-Net model: 1) designed and added the dilated channel-wise CNN module, 2) simplified the U shape network. Based on these two modifications, we proposed a novel light-weight architecture -- Channel-wise Feature Pyramid Network for Medicine (CFPNet-M). To evaluate our method, we selected five datasets with different modalities: thermography, electron microscopy, endoscopy, dermoscopy, and digital retinal images. And we compared its performance with several models having different parameter scales. This paper also involves our previous studies of DC-UNet and some commonly used light-weight neural networks. We applied the Tanimoto similarity instead of the Jaccard index for gray-level image measurements. By comparison, CFPNet-M achieves comparable segmentation results on all five medical datasets with only 0.65 million parameters, which is about 2% of U-Net, and 8.8 MB memory. Meanwhile, the inference speed can reach 80 FPS on a single RTX 2070Ti GPU with the 256 by 192 pixels input size.

[248]  arXiv:2105.04078 [pdf, other]
Title: Self-supervised spectral matching network for hyperspectral target detection
Comments: IGARSS 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Hyperspectral target detection is a pixel-level recognition problem. Given a few target samples, it aims to identify the specific target pixels such as airplane, vehicle, ship, from the entire hyperspectral image. In general, the background pixels take the majority of the image and complexly distributed. As a result, the datasets are weak annotated and extremely imbalanced. To address these problems, a spectral mixing based self-supervised paradigm is designed for hyperspectral data to obtain an effective feature representation. The model adopts a spectral similarity based matching network framework. In order to learn more discriminative features, a pair-based loss is adopted to minimize the distance between target pixels while maximizing the distances between target and background. Furthermore, through a background separated step, the complex unlabeled spectra are downsampled into different sub-categories. The experimental results on three real hyperspectral datasets demonstrate that the proposed framework achieves better results compared with the existing detectors.

[249]  arXiv:2105.04079 [pdf, ps, other]
Title: Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method
Comments: 5 pages, 3 figures, accepted for European Signal Processing Conference 2021 (EUSIPCO 2021)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals. Since sampling frequency, one of the audio signal varieties, is usually application specific, the preceding audio source separation model should be able to deal with audio signals of all sampling frequencies specified in the target applications. However, conventional models based on deep neural networks (DNNs) are trained only at the sampling frequency specified by the training data, and there are no guarantees that they work with unseen sampling frequencies. In this paper, we propose a convolution layer capable of handling arbitrary sampling frequencies by a single DNN. Through music source separation experiments, we show that the introduction of the proposed layer enables a conventional audio source separation model to consistently work with even unseen sampling frequencies.

[250]  arXiv:2105.04080 [pdf, other]
Title: Exponentially convergent multiscale methods for high frequency heterogeneous Helmholtz equations
Authors: Y. Chen, T.Y. Hou, Y. Wang
Comments: 32 pages, 9 figures
Subjects: Numerical Analysis (math.NA)

In this paper, we present a multiscale framework for solving the Helmholtz equation in heterogeneous media without scale separation and in the high frequency regime where the wavenumber $k$ can be large. The main innovation is that our methods achieve a nearly exponential rate of convergence with respect to the computational degrees of freedom, using a coarse grid of mesh size $O(1/k)$ without suffering from the well-known pollution effect. The key idea is a coarse-fine scale decomposition of the solution space that adapts to the media property and wavenumber; this decomposition is inspired by the multiscale finite element method. We show that the coarse part is of low complexity in the sense that it can be approximated with a nearly exponential rate of convergence via local basis functions, while the fine part is local such that it can be computed efficiently using the local information of the right hand side. The combination of the two parts yields the overall nearly exponential rate of convergence. We demonstrate the effectiveness of our methods theoretically and numerically; an exponential rate of convergence is consistently observed and confirmed. In addition, we observe the robustness of our methods regarding the high contrast in the media numerically.

[251]  arXiv:2105.04084 [pdf, ps, other]
Title: A Coupled Random Projection Approach to Large-Scale Canonical Polyadic Decomposition
Subjects: Machine Learning (cs.LG)

We propose a novel algorithm for the computation of canonical polyadic decomposition (CPD) of large-scale tensors. The proposed algorithm generalizes the random projection (RAP) technique, which is often used to compute large-scale decompositions, from one single projection to multiple but coupled random projections (CoRAP). The proposed CoRAP technique yields a set of tensors that together admits a coupled CPD (C-CPD) and a C-CPD algorithm is then used to jointly decompose these tensors. The results of C-CPD are finally fused to obtain factor matrices of the original large-scale data tensor. As more data samples are jointly exploited via C-CPD, the proposed CoRAP based CPD is more accurate than RAP based CPD. Experiments are provided to illustrate the performance of the proposed approach.

[252]  arXiv:2105.04086 [pdf, other]
Title: Deep Reinforcement Learning-based Methods for Resource Scheduling in Cloud Computing: A Review and Future Directions
Comments: 18 pages,9 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

As the quantity and complexity of information processed by software systems increase, large-scale software systems have an increasing requirement for high-performance distributed computing systems. With the acceleration of the Internet in Web 2.0, Cloud computing as a paradigm to provide dynamic, uncertain and elastic services has shown superiorities to meet the computing needs dynamically. Without an appropriate scheduling approach, extensive Cloud computing may cause high energy consumptions and high cost, in addition that high energy consumption will cause massive carbon dioxide emissions. Moreover, inappropriate scheduling will reduce the service life of physical devices as well as increase response time to users' request. Hence, efficient scheduling of resource or optimal allocation of request, that usually a NP-hard problem, is one of the prominent issues in emerging trends of Cloud computing. Focusing on improving quality of service (QoS), reducing cost and abating contamination, researchers have conducted extensive work on resource scheduling problems of Cloud computing over years. Nevertheless, growing complexity of Cloud computing, that the super-massive distributed system, is limiting the application of scheduling approaches. Machine learning, a utility method to tackle problems in complex scenes, is used to resolve the resource scheduling of Cloud computing as an innovative idea in recent years. Deep reinforcement learning (DRL), a combination of deep learning (DL) and reinforcement learning (RL), is one branch of the machine learning and has a considerable prospect in resource scheduling of Cloud computing. This paper surveys the methods of resource scheduling with focus on DRL-based scheduling approaches in Cloud computing, also reviews the application of DRL as well as discusses challenges and future directions of DRL in scheduling of Cloud computing.

[253]  arXiv:2105.04088 [pdf, other]
Title: PEARL: Parallelized Expert-Assisted Reinforcement Learning for Scene Rearrangement Planning
Comments: 7 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI)

Scene Rearrangement Planning (SRP) is an interior task proposed recently. The previous work defines the action space of this task with handcrafted coarse-grained actions that are inflexible to be used for transforming scene arrangement and intractable to be deployed in practice. Additionally, this new task lacks realistic indoor scene rearrangement data to feed popular data-hungry learning approaches and meet the needs of quantitative evaluation. To address these problems, we propose a fine-grained action definition for SRP and introduce a large-scale scene rearrangement dataset. We also propose a novel learning paradigm to efficiently train an agent through self-playing, without any prior knowledge. The agent trained via our paradigm achieves superior performance on the introduced dataset compared to the baseline agents. We provide a detailed analysis of the design of our approach in our experiments.

[254]  arXiv:2105.04089 [pdf]
Title: Effective Methods of QR-Decompositions of Square Complex Matrices by Fast Discrete Signal-Induced Heap Transforms
Comments: 19 pages, 4 figures, 1 table
Subjects: Numerical Analysis (math.NA)

The purpose of this work is to present an effective tool for computing different QR-decompositions of a complex nonsingular square matrix. The concept of the discrete signal-induced heap transform (DsiHT, Grigoryan 2006) is used. This transform is fast, has a unique algorithm for any length of the input vector/signal and can be used with different complex basic 2x2 transforms. The DsiHT zeroes all components of the input signal while moving or heaping the energy of the signal into one component, such as the first. We describe three different types of QR-decompositions that use the basic transforms with the T, G, and M-type complex matrices we introduce, and also without matrices, but using analytical formulas. We also present the mixed QR-decomposition, when different type DsiHTs are used at different stages of the algorithm. The number of such decompositions is greater than 3^((N-1)), for an NxN complex matrix. Examples of the QR-decomposition are described in detail for the 4x4 and 6x6 complex matrices and compared with the known method of Householder transforms. The precision of the QR-decompositions of NxN matrices, when N are 6, 13, 17, 19, 21, 40, 64, 100, 128, 201, 256, and 400 is also compared. The MATLAB-based scripts of the codes for QR-decompositions by the described DsiHTs are given.

[255]  arXiv:2105.04090 [pdf, other]
Title: MuseMorphose: Full-Song and Fine-Grained Music Style Transfer with Just One Transformer VAE
Comments: Preprint. 26 pages, 7 figures, and 8 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Transformers and variational autoencoders (VAE) have been extensively employed for symbolic (e.g., MIDI) domain music generation. While the former boast an impressive capability in modeling long sequences, the latter allow users to willingly exert control over different parts (e.g., bars) of the music to be generated. In this paper, we are interested in bringing the two together to construct a single model that exhibits both strengths. The task is split into two steps. First, we equip Transformer decoders with the ability to accept segment-level, time-varying conditions during sequence generation. Subsequently, we combine the developed and tested in-attention decoder with a Transformer encoder, and train the resulting MuseMorphose model with the VAE objective to achieve style transfer of long musical pieces, in which users can specify musical attributes including rhythmic intensity and polyphony (i.e., harmonic fullness) they desire, down to the bar level. Experiments show that MuseMorphose outperforms recurrent neural network (RNN) based prior art on numerous widely-used metrics for style transfer tasks.

[256]  arXiv:2105.04091 [pdf, ps, other]
Title: Diversity Analysis of Millimeter-Wave OFDM Massive MIMO Systems
Comments: 12 pages, 4 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We analyze the diversity gain for a distributed antenna subarray employing orthogonal frequency-division multiplexing (OFDM) in millimeter-wave (mm-Wave) massive multiple-input multiple-output (MIMO) systems. We show that the diversity gain depends on the number of transmitted data streams, the number of remote antenna units, and the number of propagation paths between RAUs. Furthermore, we show that by using bit-interleaved coded multiple beamforming (BICMB), one can achieve the maximum diversity gain in a distributed antenna subarray system. The assumption in both scenarios is that the number of the antennas at the transmitter and the receiver are large enough and channel state information (CSI) is known at the transmitter and the receiver.

[257]  arXiv:2105.04093 [pdf, ps, other]
Title: Elastic Weight Consolidation (EWC): Nuts and Bolts
Authors: Abhishek Aich
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

In this report, we present a theoretical support of the continual learning method \textbf{Elastic Weight Consolidation}, introduced in paper titled `Overcoming catastrophic forgetting in neural networks'. Being one of the most cited paper in regularized methods for continual learning, this report disentangles the underlying concept of the proposed objective function. We assume that the reader is aware of the basic terminologies of continual learning.

[258]  arXiv:2105.04097 [pdf]
Title: Examining convolutional feature extraction using Maximum Entropy (ME) and Signal-to-Noise Ratio (SNR) for image classification
Comments: Conference paper, 6 pages, 1 table
Journal-ref: Proceedings of the 46th Annual Conference of the IEEE Industrial Electronics Society (IECON2020). IEEE Computer Society Press, pp.471-476
Subjects: Neural and Evolutionary Computing (cs.NE)

Convolutional Neural Networks (CNNs) specialize in feature extraction rather than function mapping. In doing so they form complex internal hierarchical feature representations, the complexity of which gradually increases with a corresponding increment in neural network depth. In this paper, we examine the feature extraction capabilities of CNNs using Maximum Entropy (ME) and Signal-to-Noise Ratio (SNR) to validate the idea that, CNN models should be tailored for a given task and complexity of the input data. SNR and ME measures are used as they can accurately determine in the input dataset, the relative amount of signal information to the random noise and the maximum amount of information respectively. We use two well known benchmarking datasets, MNIST and CIFAR-10 to examine the information extraction and abstraction capabilities of CNNs. Through our experiments, we examine convolutional feature extraction and abstraction capabilities in CNNs and show that the classification accuracy or performance of CNNs is greatly dependent on the amount, complexity and quality of the signal information present in the input data. Furthermore, we show the effect of information overflow and underflow on CNN classification accuracies. Our hypothesis is that the feature extraction and abstraction capabilities of convolutional layers are limited and therefore, CNN models should be tailored to the input data by using appropriately sized CNNs based on the SNR and ME measures of the input dataset.

[259]  arXiv:2105.04098 [pdf, other]
Title: SRLF: A Stance-aware Reinforcement Learning Framework for Content-based Rumor Detection on Social Media
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The rapid development of social media changes the lifestyle of people and simultaneously provides an ideal place for publishing and disseminating rumors, which severely exacerbates social panic and triggers a crisis of social trust. Early content-based methods focused on finding clues from the text and user profiles for rumor detection. Recent studies combine the stances of users' comments with news content to capture the difference between true and false rumors. Although the user's stance is effective for rumor detection, the manual labeling process is time-consuming and labor-intensive, which limits the application of utilizing it to facilitate rumor detection.
In this paper, we first finetune a pre-trained BERT model on a small labeled dataset and leverage this model to annotate weak stance labels for users' comment data to overcome the problem mentioned above. Then, we propose a novel Stance-aware Reinforcement Learning Framework (SRLF) to select high-quality labeled stance data for model training and rumor detection. Both the stance selection and rumor detection tasks are optimized simultaneously to promote both tasks mutually. We conduct experiments on two commonly used real-world datasets. The experimental results demonstrate that our framework outperforms the state-of-the-art models significantly, which confirms the effectiveness of the proposed framework.

[260]  arXiv:2105.04100 [pdf, other]
Title: Z-GCNETs: Time Zigzags at Graph Convolutional Networks for Time Series Forecasting
Comments: Accepted at the International Conference on Machine Learning (ICML) 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

There recently has been a surge of interest in developing a new class of deep learning (DL) architectures that integrate an explicit time dimension as a fundamental building block of learning and representation mechanisms. In turn, many recent results show that topological descriptors of the observed data, encoding information on the shape of the dataset in a topological space at different scales, that is, persistent homology of the data, may contain important complementary information, improving both performance and robustness of DL. As convergence of these two emerging ideas, we propose to enhance DL architectures with the most salient time-conditioned topological information of the data and introduce the concept of zigzag persistence into time-aware graph convolutional networks (GCNs). Zigzag persistence provides a systematic and mathematically rigorous framework to track the most important topological features of the observed data that tend to manifest themselves over time. To integrate the extracted time-conditioned topological descriptors into DL, we develop a new topological summary, zigzag persistence image, and derive its theoretical stability guarantees. We validate the new GCNs with a time-aware zigzag topological layer (Z-GCNETs), in application to traffic forecasting and Ethereum blockchain price prediction. Our results indicate that Z-GCNET outperforms 13 state-of-the-art methods on 4 time series datasets.

[261]  arXiv:2105.04102 [pdf, other]
Title: Deep feature selection-and-fusion for RGB-D semantic segmentation
Comments: ICME 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Scene depth information can help visual information for more accurate semantic segmentation. However, how to effectively integrate multi-modality information into representative features is still an open problem. Most of the existing work uses DCNNs to implicitly fuse multi-modality information. But as the network deepens, some critical distinguishing features may be lost, which reduces the segmentation performance. This work proposes a unified and efficient feature selectionand-fusion network (FSFNet), which contains a symmetric cross-modality residual fusion module used for explicit fusion of multi-modality information. Besides, the network includes a detailed feature propagation module, which is used to maintain low-level detailed information during the forward process of the network. Compared with the state-of-the-art methods, experimental evaluations demonstrate that the proposed model achieves competitive performance on two public datasets.

[262]  arXiv:2105.04103 [pdf]
Title: BIM Hyperreality: Data Synthesis Using BIM and Hyperrealistic Rendering for Deep Learning
Comments: Accepted to the 40th Annual Conference of the Association for Computer Aided Design in Architecture (ACADIA 2020)
Subjects: Machine Learning (cs.LG)

Deep learning is expected to offer new opportunities and a new paradigm for the field of architecture. One such opportunity is teaching neural networks to visually understand architectural elements from the built environment. However, the availability of large training datasets is one of the biggest limitations of neural networks. Also, the vast majority of training data for visual recognition tasks is annotated by humans. In order to resolve this bottleneck, we present a concept of a hybrid system using both building information modeling (BIM) and hyperrealistic (photorealistic) rendering to synthesize datasets for training a neural network for building object recognition in photos. For generating our training dataset BIMrAI, we used an existing BIM model and a corresponding photo-realistically rendered model of the same building. We created methods for using renderings to train a deep learning model, trained a generative adversarial network (GAN) model using these methods, and tested the output model on real-world photos. For the specific case study presented in this paper, our results show that a neural network trained with synthetic data; i.e., photorealistic renderings and BIM-based semantic labels, can be used to identify building objects from photos without using photos in the training data. Future work can enhance the presented methods using available BIM models and renderings for more generalized mapping and description of photographed built environments.

[263]  arXiv:2105.04104 [pdf, other]
Title: AppealNet: An Efficient and Highly-Accurate Edge/Cloud Collaborative Architecture for DNN Inference
Comments: Accepted by DAC2021
Subjects: Machine Learning (cs.LG)

This paper presents AppealNet, a novel edge/cloud collaborative architecture that runs deep learning (DL) tasks more efficiently than state-of-the-art solutions. For a given input, AppealNet accurately predicts on-the-fly whether it can be successfully processed by the DL model deployed on the resource-constrained edge device, and if not, appeals to the more powerful DL model deployed at the cloud. This is achieved by employing a two-head neural network architecture that explicitly takes inference difficulty into consideration and optimizes the tradeoff between accuracy and computation/communication cost of the edge/cloud collaborative architecture. Experimental results on several image classification datasets show up to more than 40% energy savings compared to existing techniques without sacrificing accuracy.

[264]  arXiv:2105.04105 [pdf, ps, other]
Title: On the Hardness of Opinion Dynamics Optimization with $L_1$-Budget on Varying Susceptibility to Persuasion
Subjects: Social and Information Networks (cs.SI); Computer Science and Game Theory (cs.GT)

Recently, Abebe et al. (KDD 2018) and Chan et al. (WWW 2019) have considered an opinion dynamics optimization problem that is based on a popular model for social opinion dynamics, in which each agent has some fixed innate opinion, and a resistance that measures the importance it places on its innate opinion; moreover, the agents influence one another's opinions through an iterative process. Under certain conditions, this iterative process converges to some equilibrium opinion vector. Previous works gave an efficient local search algorithm to solve the unbudgeted variant of the problem, for which the goal is to modify the resistance of any number of agents (within some given range) such that the sum of the equilibrium opinions is minimized. On the other hand, it was proved that the $L_0$-budgeted variant is NP-hard, where the $L_0$-budget is a restriction given upfront on the number of agents whose resistance may be modified.
Inspired by practical situations in which the effort to modify an agent's resistance increases with the magnitude of the change, we propose the $L_1$-budgeted variant, in which the $L_1$-budget is a restriction on the sum of the magnitudes of the changes over all agents' resistance parameters. In this work, we show that the $L_1$-budgeted variant is NP-hard via a reduction from vertex cover. However, contrary to the $L_0$-budgeted variant, a very technical argument is needed to show that the optimal solution can be achieved by focusing the given $L_1$-budget on as small a number of agents as possible, as opposed to spreading the budget over a large number of agents.

[265]  arXiv:2105.04107 [pdf, other]
Title: MmWave MIMO Communication with Semi-Passive RIS: A Low-Complexity Channel Estimation Scheme
Comments: 6 pages, 3 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Reconfigurable intelligent surfaces (RISs) have recently received widespread attention in the field of wireless communication. An RIS can be controlled to reflect incident waves from the transmitter towards the receiver; a feature that is believed to fundamentally contribute to beyond 5G wireless technology. The typical RIS consists of entirely passive elements, which requires the high-dimensional channel estimation to be done elsewhere. Therefore, in this paper, we present a semi-passive large-scale RIS architecture equipped with only a small fraction of simplified receiver units with only 1-bit quantization. Based on this architecture, we first propose an alternating direction method of multipliers (ADMM)-based approach to recover the training signals at the passive RIS elements, We then obtain the global channel by combining a channel sparsification step with the generalized approximate message passing (GAMP) algorithm. Our proposed scheme exploits both the sparsity and low-rankness properties of the channel in the joint spatial-frequency domain of a wideband mmWave multiple-input-multiple-output (MIMO) communication system. Simulation results show that the proposed algorithm can significantly reduce the pilot signaling needed for accurate channel estimation and outperform previous methods, even with fewer receiver units.

[266]  arXiv:2105.04110 [pdf, ps, other]
Title: A Framework for Reasoning About LF Specifications
Authors: Mary Southern
Subjects: Logic in Computer Science (cs.LO)

This thesis develops a framework for formalizing reasoning about specifications of systems written in LF. This formalization centers around the development of a reasoning logic that can express the sorts of properties which arise in reasoning about such specifications. In this logic, type inhabitation judgements in LF serve as atomic formulas, and quantification is permitted over both contexts and terms in these judgements. The logic permits arbitrary relations over derivations of LF judgements to be expressed using a collection of logical connectives, in contrast to other systems for reasoning about LF specifications. Defining a semantics for these formulas raises issues which we must address, such as how to interpret both term and context quantification as well as the relation between atomic formulas and the LF judgements they are meant to encode.
This thesis also develops a proof system which captures informal reasoning steps as sound inference rules for the logic. To achieve this we develop a collection of proof rules including mechanisms for both case analysis and inductive reasoning over the derivations of judgements in LF. The proof system also supports applying LF meta-theorems through proof rules that enforce requirements of the LF meta-theorem that cannot be expressed in the logic.
We also implement a proof assistant called Adelfa that provides a means for mechanizing this approach to reasoning about specifications written in LF. A characteristic of this proof assistant is that it uses the proof rules that complement the logic to describe a collection of tactics that are used to develop proofs in goal-driven fashion. The Adelfa system is used to develop a collection of examples which demonstrate the effectiveness of the framework and showcase how informal reasoning about specifications written in LF can be formalized using the logic and associated proof system.

[267]  arXiv:2105.04112 [pdf, other]
Title: ROBI: A Multi-View Dataset for Reflective Objects in Robotic Bin-Picking
Subjects: Robotics (cs.RO)

In robotic bin-picking applications, the perception of texture-less, highly reflective parts is a valuable but challenging task. The high glossiness can introduce fake edges in RGB images and inaccurate depth measurements especially in heavily cluttered bin scenario. In this paper, we present the ROBI (Reflective Objects in BIns) dataset, a public dataset for 6D object pose estimation and multi-view depth fusion in robotic bin-picking scenarios. The ROBI dataset includes a total of 63 bin-picking scenes captured with two active stereo camera: a high-cost Ensenso sensor and a low-cost RealSense sensor. For each scene, the monochrome/RGB images and depth maps are captured from sampled view spheres around the scene, and are annotated with accurate 6D poses of visible objects and an associated visibility score. For evaluating the performance of depth fusion, we captured the ground truth depth maps by high-cost Ensenso camera with objects coated in anti-reflective scanning spray. To show the utility of the dataset, we evaluated the representative algorithms of 6D object pose estimation and multi-view depth fusion on the full dataset. Evaluation results demonstrate the difficulty of highly reflective objects, especially in difficult cases due to the degradation of depth data quality, severe occlusions and cluttered scene. The ROBI dataset is available online at https://www.trailab.utias.utoronto.ca/robi.

[268]  arXiv:2105.04113 [pdf, other]
Title: Multi-Agent Semi-Siamese Training for Long-tail and Shallow Face Learning
Comments: 12 pages, 8 figures. arXiv admin note: text overlap with arXiv:2007.08398
Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the recent development of deep convolutional neural networks and large-scale datasets, deep face recognition has made remarkable progress and been widely used in various applications. However, unlike the existing public face datasets, in many real-world scenarios of face recognition, the depth of training dataset is shallow, which means only two face images are available for each ID. With the non-uniform increase of samples, such issue is converted to a more general case, a.k.a long-tail face learning, which suffers from data imbalance and intra-class diversity dearth simultaneously. These adverse conditions damage the training and result in the decline of model performance. Based on the Semi-Siamese Training (SST), we introduce an advanced solution, named Multi-Agent Semi-Siamese Training (MASST), to address these problems. MASST includes a probe network and multiple gallery agents, the former aims to encode the probe features, and the latter constitutes a stack of networks that encode the prototypes (gallery features). For each training iteration, the gallery network, which is sequentially rotated from the stack, and the probe network form a pair of semi-siamese networks. We give theoretical and empirical analysis that, given the long-tail (or shallow) data and training loss, MASST smooths the loss landscape and satisfies the Lipschitz continuity with the help of multiple agents and the updating gallery queue. The proposed method is out of extra-dependency, thus can be easily integrated with the existing loss functions and network architectures. It is worth noting that, although multiple gallery agents are employed for training, only the probe network is needed for inference, without increasing the inference cost. Extensive experiments and comparisons demonstrate the advantages of MASST for long-tail and shallow face learning.

[269]  arXiv:2105.04117 [pdf, other]
Title: Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia
Comments: SIGIR'21
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)

Wikipedia is the largest online encyclopedia, used by algorithms and web users as a central hub of reliable information on the web. The quality and reliability of Wikipedia content is maintained by a community of volunteer editors. Machine learning and information retrieval algorithms could help scale up editors' manual efforts around Wikipedia content reliability. However, there is a lack of large-scale data to support the development of such research. To fill this gap, in this paper, we propose Wiki-Reliability, the first dataset of English Wikipedia articles annotated with a wide set of content reliability issues. To build this dataset, we rely on Wikipedia "templates". Templates are tags used by expert Wikipedia editors to indicate content issues, such as the presence of "non-neutral point of view" or "contradictory articles", and serve as a strong signal for detecting reliability issues in a revision. We select the 10 most popular reliability-related templates on Wikipedia, and propose an effective method to label almost 1M samples of Wikipedia article revisions as positive or negative with respect to each template. Each positive/negative example in the dataset comes with the full article text and 20 features from the revision's metadata. We provide an overview of the possible downstream tasks enabled by such data, and show that Wiki-Reliability can be used to train large-scale models for content reliability prediction. We release all data and code for public use.

[270]  arXiv:2105.04118 [pdf, other]
Title: FAID Diversity via Neural Networks
Comments: 7 pages, 3 figures, 3 tables. A shorter version is submitted to the International Symposium on Topics in Coding, 2021
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)

Decoder diversity is a powerful error correction framework in which a collection of decoders collaboratively correct a set of error patterns otherwise uncorrectable by any individual decoder. In this paper, we propose a new approach to design the decoder diversity of finite alphabet iterative decoders (FAIDs) for Low-Density Parity Check (LDPC) codes over the binary symmetric channel (BSC), for the purpose of lowering the error floor while guaranteeing the waterfall performance. The proposed decoder diversity is achieved by training a recurrent quantized neural network (RQNN) to learn/design FAIDs. We demonstrated for the first time that a machine-learned decoder can surpass in performance a man-made decoder of the same complexity. As RQNNs can model a broad class of FAIDs, they are capable of learning an arbitrary FAID. To provide sufficient knowledge of the error floor to the RQNN, the training sets are constructed by sampling from the set of most problematic error patterns - trapping sets. In contrast to the existing methods that use the cross-entropy function as the loss function, we introduce a frame-error-rate (FER) based loss function to train the RQNN with the objective of correcting specific error patterns rather than reducing the bit error rate (BER). The examples and simulation results show that the RQNN-aided decoder diversity increases the error correction capability of LDPC codes and lowers the error floor.

[271]  arXiv:2105.04120 [pdf, ps, other]
Title: Fast constraint satisfaction problem and learning-based algorithm for solving Minesweeper
Subjects: Artificial Intelligence (cs.AI)

Minesweeper is a popular spatial-based decision-making game that works with incomplete information. As an exemplary NP-complete problem, it is a major area of research employing various artificial intelligence paradigms. The present work models this game as Constraint Satisfaction Problem (CSP) and Markov Decision Process (MDP). We propose a new method named as dependents from the independent set using deterministic solution search (DSScsp) for the faster enumeration of all solutions of a CSP based Minesweeper game and improve the results by introducing heuristics. Using MDP, we implement machine learning methods on these heuristics. We train the classification model on sparse data with results from CSP formulation. We also propose a new rewarding method for applying a modified deep Q-learning for better accuracy and versatile learning in the Minesweeper game. The overall results have been analyzed for different kinds of Minesweeper games and their accuracies have been recorded. Results from these experiments show that the proposed method of MDP based classification model and deep Q-learning overall is the best methods in terms of accuracy for games with given mine densities.

[272]  arXiv:2105.04123 [pdf, other]
Title: Neural Program Repair with Execution-based Backpropagation
Subjects: Software Engineering (cs.SE)

Neural machine translation (NMT) architectures have achieved promising results for automatic program repair. Yet, they have the limitation of generating low-quality patches(e.g., not compilable patches). This is because the existing works only optimize a purely syntactic loss function based on characters and tokens without incorporating program-specific information during neural net weight optimization. In this paper, we proposea novel program repair model called RewardRepair. The core novelty of RewardRepair is to improve NMT-based program repair with a loss function based on program compilation and test execution information, rewarding the network to produce patches that compile and that do not overfit. We conduct several experiments to evaluate RewardRepair showing that it is feasible and effective to use compilation and test execution results to optimize the underlying neural repair model.

[273]  arXiv:2105.04124 [pdf, other]
Title: MASS: Multi-task Anthropomorphic Speech Synthesis Framework
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Text-to-Speech (TTS) synthesis plays an important role in human-computer interaction. Currently, most TTS technologies focus on the naturalness of speech, namely,making the speeches sound like humans. However, the key tasks of the expression of emotion and the speaker identity are ignored, which limits the application scenarios of TTS synthesis technology. To make the synthesized speech more realistic and expand the application scenarios, we propose a multi-task anthropomorphic speech synthesis framework (MASS), which can synthesize speeches from text with specified emotion and speaker identity. The MASS framework consists of a base TTS module and two novel voice conversion modules: the emotional voice conversion module and the speaker voice conversion module. We propose deep emotion voice conversion model (DEVC) and deep speaker voice conversion model (DSVC) based on convolution residual networks. It solves the problem of feature loss during voice conversion. The model trainings are independent of parallel datasets, and are capable of many-to-many voice conversion. In the emotional voice conversion, speaker voice conversion experiments, as well as the multi-task speech synthesis experiments, experimental results show DEVC and DSVC convert speech effectively. The quantitative and qualitative evaluation results of multi-task speech synthesis experiments show MASS can effectively synthesis speech with specified text, emotion and speaker identity.

[274]  arXiv:2105.04126 [pdf, other]
Title: ExpMRC: Explainability Evaluation for Machine Reading Comprehension
Comments: 10 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Achieving human-level performance on some of Machine Reading Comprehension (MRC) datasets is no longer challenging with the help of powerful Pre-trained Language Models (PLMs). However, it is necessary to provide both answer prediction and its explanation to further improve the MRC system's reliability, especially for real-life applications. In this paper, we propose a new benchmark called ExpMRC for evaluating the explainability of the MRC systems. ExpMRC contains four subsets, including SQuAD, CMRC 2018, RACE$^+$, and C$^3$ with additional annotations of the answer's evidence. The MRC systems are required to give not only the correct answer but also its explanation. We use state-of-the-art pre-trained language models to build baseline systems and adopt various unsupervised approaches to extract evidence without a human-annotated training set. The experimental results show that these models are still far from human performance, suggesting that the ExpMRC is challenging. Resources will be available through https://github.com/ymcui/expmrc

[275]  arXiv:2105.04128 [pdf]
Title: Examining and Mitigating Kernel Saturation in Convolutional Neural Networks using Negative Images
Comments: Conference paper, 6 pages, 3 figures, 1 table
Journal-ref: Proceedings of the 46th Annual Conference of the IEEE Industrial Electronics Society (IECON2020). IEEE Computer Society Press, pp.465-470
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Neural saturation in Deep Neural Networks (DNNs) has been studied extensively, but remains relatively unexplored in Convolutional Neural Networks (CNNs). Understanding and alleviating the effects of convolutional kernel saturation is critical for enhancing CNN models classification accuracies. In this paper, we analyze the effect of convolutional kernel saturation in CNNs and propose a simple data augmentation technique to mitigate saturation and increase classification accuracy, by supplementing negative images to the training dataset. We hypothesize that greater semantic feature information can be extracted using negative images since they have the same structural information as standard images but differ in their data representations. Varied data representations decrease the probability of kernel saturation and thus increase the effectiveness of kernel weight updates. The two datasets selected to evaluate our hypothesis were CIFAR- 10 and STL-10 as they have similar image classes but differ in image resolutions thus making for a better understanding of the saturation phenomenon. MNIST dataset was used to highlight the ineffectiveness of the technique for linearly separable data. The ResNet CNN architecture was chosen since the skip connections in the network ensure the most important features contributing the most to classification accuracy are retained. Our results show that CNNs are indeed susceptible to convolutional kernel saturation and that supplementing negative images to the training dataset can offer a statistically significant increase in classification accuracies when compared against models trained on the original datasets. Our results present accuracy increases of 6.98% and 3.16% on the STL-10 and CIFAR-10 datasets respectively.

[276]  arXiv:2105.04129 [pdf, other]
Title: Parameter-free Gradient Temporal Difference Learning
Comments: 30 pages, 10 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reinforcement learning lies at the intersection of several challenges. Many applications of interest involve extremely large state spaces, requiring function approximation to enable tractable computation. In addition, the learner has only a single stream of experience with which to evaluate a large number of possible courses of action, necessitating algorithms which can learn off-policy. However, the combination of off-policy learning with function approximation leads to divergence of temporal difference methods. Recent work into gradient-based temporal difference methods has promised a path to stability, but at the cost of expensive hyperparameter tuning. In parallel, progress in online learning has provided parameter-free methods that achieve minimax optimal guarantees up to logarithmic terms, but their application in reinforcement learning has yet to be explored. In this work, we combine these two lines of attack, deriving parameter-free, gradient-based temporal difference algorithms. Our algorithms run in linear time and achieve high-probability convergence guarantees matching those of GTD2 up to $\log$ factors. Our experiments demonstrate that our methods maintain high prediction performance relative to fully-tuned baselines, with no tuning whatsoever.

[277]  arXiv:2105.04132 [pdf, other]
Title: An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery
Comments: 35 pages. Accepted by ISPRS Journal of Photogrammetry and Remote Sensing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Semantic segmentation is an essential part of deep learning. In recent years, with the development of remote sensing big data, semantic segmentation has been increasingly used in remote sensing. Deep convolutional neural networks (DCNNs) face the challenge of feature fusion: very-high-resolution remote sensing image multisource data fusion can increase the network's learnable information, which is conducive to correctly classifying target objects by DCNNs; simultaneously, the fusion of high-level abstract features and low-level spatial features can improve the classification accuracy at the border between target objects. In this paper, we propose a multipath encoder structure to extract features of multipath inputs, a multipath attention-fused block module to fuse multipath features, and a refinement attention-fused block module to fuse high-level abstract features and low-level spatial features. Furthermore, we propose a novel convolutional neural network architecture, named attention-fused network (AFNet). Based on our AFNet, we achieve state-of-the-art performance with an overall accuracy of 91.7% and a mean F1 score of 90.96% on the ISPRS Vaihingen 2D dataset and an overall accuracy of 92.1% and a mean F1 score of 93.44% on the ISPRS Potsdam 2D dataset.

[278]  arXiv:2105.04133 [pdf, other]
Title: Coupling Intent and Action for Pedestrian Crossing Behavior Prediction
Comments: 7pages, 4 figures, 3 tables. Accepted to IJCAI2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate prediction of pedestrian crossing behaviors by autonomous vehicles can significantly improve traffic safety. Existing approaches often model pedestrian behaviors using trajectories or poses but do not offer a deeper semantic interpretation of a person's actions or how actions influence a pedestrian's intention to cross in the future. In this work, we follow the neuroscience and psychological literature to define pedestrian crossing behavior as a combination of an unobserved inner will (a probabilistic representation of binary intent of crossing vs. not crossing) and a set of multi-class actions (e.g., walking, standing, etc.). Intent generates actions, and the future actions in turn reflect the intent. We present a novel multi-task network that predicts future pedestrian actions and uses predicted future action as a prior to detect the present intent and action of the pedestrian. We also designed an attention relation network to incorporate external environmental contexts thus further improve intent and action detection performance. We evaluated our approach on two naturalistic driving datasets, PIE and JAAD, and extensive experiments show significantly improved and more explainable results for both intent detection and action prediction over state-of-the-art approaches. Our code is available at: https://github.com/umautobots/pedestrian_intent_action_detection.

[279]  arXiv:2105.04138 [pdf, ps, other]
Title: Near Interference-Free Space-Time User Scheduling for MmWave Cellular Network
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)

The highly directional beams applied in millimeter wave (mmWave) cellular networks make it possible to achieve near interference-free (NIF) transmission under judiciously designed space-time user scheduling, where the power of intra-/inter-cell interference between any two users is below a predefined threshold. In this paper, we investigate two aspects of the NIF space-time user scheduling in a multi-cell mmWave network with multi-RF-chain base stations. Firstly, given that each user has a requirement on the number of space-time resource elements, we study the NIF user scheduling problem to minimize the unfulfilled user requirements, so that the space-time resources can be utilized most efficiently and meanwhile all strong interferences are avoided. A near-optimal scheduling algorithm is proposed with performance close to the lower bound of unfulfilled requirements. Furthermore, we study the joint NIF user scheduling and power allocation problem to minimize the power consumption under the constraint of rate requirements. Based on our proposed NIF scheduling, an energy-efficient joint scheduling and power allocation scheme is designed with limited channel state information, which outperforms the existing independent set based schemes, and has near-optimal performance as well.

[280]  arXiv:2105.04143 [pdf, other]
Title: Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Observing a set of images and their corresponding paragraph-captions, a challenging task is to learn how to produce a semantically coherent paragraph to describe the visual content of an image. Inspired by recent successes in integrating semantic topics into this task, this paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework, which couples a visual extractor with a deep topic model to guide the learning of a language model. To capture the correlations between the image and text at multiple levels of abstraction and learn the semantic topics from images, we design a variational inference network to build the mapping from image features to textual captions. To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model, including Long Short-Term Memory (LSTM) and Transformer, and jointly optimized. Experiments on public dataset demonstrate that the proposed models, which are competitive with many state-of-the-art approaches in terms of standard evaluation metrics, can be used to both distill interpretable multi-layer topics and generate diverse and coherent captions.

[281]  arXiv:2105.04144 [pdf, other]
Title: Transitioning from Real to Synthetic data: Quantifying the bias in model
Comments: Accepted at Synthetic Data Generation Workshop at ICLR 2021 this https URL
Subjects: Machine Learning (cs.LG)

With the advent of generative modeling techniques, synthetic data and its use has penetrated across various domains from unstructured data such as image, text to structured dataset modeling healthcare outcome, risk decisioning in financial domain, and many more. It overcomes various challenges such as limited training data, class imbalance, restricted access to dataset owing to privacy issues. To ensure the trained model used for automated decisioning purposes makes a fair decision there exist prior work to quantify and mitigate those issues. This study aims to establish a trade-off between bias and fairness in the models trained using synthetic data. Variants of synthetic data generation techniques were studied to understand bias amplification including differentially private generation schemes. Through experiments on a tabular dataset, we demonstrate there exist a varying levels of bias impact on models trained using synthetic data. Techniques generating less correlated feature performs well as evident through fairness metrics with 94\%, 82\%, and 88\% relative drop in DPD (demographic parity difference), EoD (equality of odds) and EoP (equality of opportunity) respectively, and 24\% relative improvement in DRP (demographic parity ratio) with respect to the real dataset. We believe the outcome of our research study will help data science practitioners understand the bias in the use of synthetic data.

[282]  arXiv:2105.04146 [pdf, other]
Title: Polynomial-Delay Enumeration of Large Maximal Matchings
Subjects: Data Structures and Algorithms (cs.DS)

Enumerating matchings is a classical problem in the field of enumeration algorithms. There are polynomial-delay enumeration algorithms for several settings, such as enumerating perfect matchings, maximal matchings, and (weighted) matchings in specific orders. In this paper, we present polynomial-delay enumeration algorithms for maximal matchings with cardinality at least given threshold $t$. Our algorithm enumerates all such matchings in $O(nm)$ delay with exponential space, where $n$ and $m$ are the number of vertices and edges of an input graph, respectively. We also present a polynomial-delay and polynomial-space enumeration algorithm for this problem. As a variant of this algorithm, we give an algorithm that enumerates all maximal matchings in non-decreasing order of its cardinality and runs in $O(nm)$ delay.

[283]  arXiv:2105.04150 [pdf, other]
Title: PeriPy -- A High Performance OpenCL Peridynamics Package
Comments: peripy.readthedocs.org
Subjects: Software Engineering (cs.SE)

This paper presents a lightweight, open-source and high-performance python package for solving peridynamics problems in solid mechanics. The development of this solver is motivated by the need for fast analysis tools to achieve the large number of simulations required for `outer-loop' applications, including sensitivity analysis, uncertainty quantification and optimisation. Our python software toolbox utilises the heterogeneous nature of OpenCL so that it can be executed on any platform with CPU or GPU cores. We illustrate the package use through a range of industrially motivated examples, which should enable other researchers to build on and extend the solver for use in their own applications. Step improvements in execution speed and functionality over existing techniques are presented. A comparison between this solver and an existing OpenCL implementation in the literature is presented, tested on benchmarks with hundreds of thousands to tens of millions of nodes. We demonstrate the scalability of the solver on the GeForce RTX 2080 TiGPU from NVIDIA, and the memory-bound limitations are analysed. In all test cases, the implementation is between 1.4 and 10.0 times faster than a similar existing GPU implementation in the literature. In particular, this improvement has been achieved by utilising local memory on the GPU.

[284]  arXiv:2105.04151 [pdf, other]
Title: Skew-Oblivious Data Routing for Data-Intensive Applications on FPGAs with HLS
Subjects: Hardware Architecture (cs.AR); Performance (cs.PF)

FPGAs have become emerging computing infrastructures for accelerating applications in datacenters. Meanwhile, high-level synthesis (HLS) tools have been proposed to ease the programming of FPGAs. Even with HLS, irregular data-intensive applications require explicit optimizations, among which multiple processing elements (PEs) with each owning a private BRAM-based buffer are usually adopted to process multiple data per cycle. Data routing, which dynamically dispatches multiple data to designated PEs, avoids data replication in buffers compared to statically assigning data to PEs, hence saving BRAM usage. However, the workload imbalance among PEs vastly diminishes performance when processing skew datasets. In this paper, we propose a skew-oblivious data routing architecture that allocates secondary PEs and schedules them to share the workload of the overloaded PEs at run-time. In addition, we integrate the proposed architecture into a framework called Ditto to minimize the development efforts for applications that require skew handling. We evaluate Ditto on five commonly used applications: histogram building, data partitioning, pagerank, heavy hitter detection and hyperloglog. The results demonstrate that the generated implementations are robust to skew datasets and outperform the stateof-the-art designs in both throughput and BRAM usage efficiency.

[285]  arXiv:2105.04153 [pdf, ps, other]
Title: Slashing Communication Traffic in Federated Learning by Transmitting Clustered Model Updates
Comments: To appear in IEEE Journal on Selected Areas in Communications
Subjects: Machine Learning (cs.LG)

Federated Learning (FL) is an emerging decentralized learning framework through which multiple clients can collaboratively train a learning model. However, a major obstacle that impedes the wide deployment of FL lies in massive communication traffic. To train high dimensional machine learning models (such as CNN models), heavy communication traffic can be incurred by exchanging model updates via the Internet between clients and the parameter server (PS), implying that the network resource can be easily exhausted. Compressing model updates is an effective way to reduce the traffic amount. However, a flexible unbiased compression algorithm applicable for both uplink and downlink compression in FL is still absent from existing works. In this work, we devise the Model Update Compression by Soft Clustering (MUCSC) algorithm to compress model updates transmitted between clients and the PS. In MUCSC, it is only necessary to transmit cluster centroids and the cluster ID of each model update. Moreover, we prove that: 1) The compressed model updates are unbiased estimation of their original values so that the convergence rate by transmitting compressed model updates is unchanged; 2) MUCSC can guarantee that the influence of the compression error on the model accuracy is minimized. Then, we further propose the boosted MUCSC (B-MUCSC) algorithm, a biased compression algorithm that can achieve an extremely high compression rate by grouping insignificant model updates into a super cluster. B-MUCSC is suitable for scenarios with very scarce network resource. Ultimately, we conduct extensive experiments with the CIFAR-10 and FEMNIST datasets to demonstrate that our algorithms can not only substantially reduce the volume of communication traffic in FL, but also improve the training efficiency in practical networks.

[286]  arXiv:2105.04154 [pdf, other]
Title: Unsupervised Human Pose Estimation through Transforming Shape Templates
Comments: CVPR 2021 (poster). Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Human pose estimation is a major computer vision problem with applications ranging from augmented reality and video capture to surveillance and movement tracking. In the medical context, the latter may be an important biomarker for neurological impairments in infants. Whilst many methods exist, their application has been limited by the need for well annotated large datasets and the inability to generalize to humans of different shapes and body compositions, e.g. children and infants. In this paper we present a novel method for learning pose estimators for human adults and infants in an unsupervised fashion. We approach this as a learnable template matching problem facilitated by deep feature extractors. Human-interpretable landmarks are estimated by transforming a template consisting of predefined body parts that are characterized by 2D Gaussian distributions. Enforcing a connectivity prior guides our model to meaningful human shape representations. We demonstrate the effectiveness of our approach on two different datasets including adults and infants.

[287]  arXiv:2105.04156 [pdf, other]
Title: ReLU Deep Neural Networks from the Hierarchical Basis Perspective
Comments: 27 pages
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)

We study ReLU deep neural networks (DNNs) by investigating their connections with the hierarchical basis method in finite element methods. First, we show that the approximation schemes of ReLU DNNs for $x^2$ and $xy$ are composition versions of the hierarchical basis approximation for these two functions. Based on this fact, we obtain a geometric interpretation and systematic proof for the approximation result of ReLU DNNs for polynomials, which plays an important role in a series of recent exponential approximation results of ReLU DNNs. Through our investigation of connections between ReLU DNNs and the hierarchical basis approximation for $x^2$ and $xy$, we show that ReLU DNNs with this special structure can be applied only to approximate quadratic functions. Furthermore, we obtain a concise representation to explicitly reproduce any linear finite element function on a two-dimensional uniform mesh by using ReLU DNNs with only two hidden layers.

[288]  arXiv:2105.04157 [pdf, ps, other]
Title: A Sharp Analysis of Covariate Adjusted Precision Matrix Estimation via Alternating Gradient Descent with Hard Thresholding
Subjects: Information Theory (cs.IT)

In this paper, we present a sharp analysis for an alternating gradient descent algorithm which is used to solve the covariate adjusted precision matrix estimation problem in the high dimensional setting. Without the resampling assumption, we demonstrate that this algorithm not only enjoys a linear rate of convergence, but also attains the optimal statistical rate (i.e., minimax rate). Moreover, our analysis also characterizes the time-data tradeoffs in the covariate adjusted precision matrix estimation problem. Numerical experiments are provided to verify our theoretical results.

[289]  arXiv:2105.04158 [pdf, ps, other]
Title: CREPO: An Open Repository to Benchmark Credal Network Algorithms
Comments: Isipta 2021 (Version with Supplementary Material)
Subjects: Artificial Intelligence (cs.AI)

Credal networks are a popular class of imprecise probabilistic graphical models obtained as a Bayesian network generalization based on, so-called credal, sets of probability mass functions. A Java library called CREMA has been recently released to model, process and query credal networks. Despite the NP-hardness of the (exact) task, a number of algorithms is available to approximate credal network inferences. In this paper we present CREPO, an open repository of synthetic credal networks, provided together with the exact results of inference tasks on these models. A Python tool is also delivered to load these data and interact with CREMA, thus making extremely easy to evaluate and compare existing and novel inference algorithms. To demonstrate such benchmarking scheme, we propose an approximate heuristic to be used inside variable elimination schemes to keep a bound on the maximum number of vertices generated during the combination step. A CREPO-based validation against approximate procedures based on linearization and exact techniques performed in CREMA is finally discussed.

[290]  arXiv:2105.04164 [pdf, other]
Title: Communication coordination in network controllability
Subjects: Systems and Control (eess.SY); Physics and Society (physics.soc-ph)

Better understanding our ability to control an interconnected system of entities has been one of the central challenges in network science. The theories of node and edge controllability have been the main methodologies suggested to find the minimal set of nodes enabling control over the whole system's dynamics. While the focus is traditionally mostly on physical systems, there has been an increasing interest in control questions involving socioeconomic systems. However, surprisingly little attention has been given to the methods' underlying assumptions on control propagation, or communication assumptions, a crucial aspect in social contexts. In this paper, we show that node controllability contains a single message assumption, allowing no heterogeneity in communication to neighbouring nodes in a network. Edge controllability is shown to relax this communication assumption but aims to control the dynamics of the edge states and not the node states, thus answering a fundamentally different question. This makes comparisons of the results from the two methods nonsensical. To increase the applicability of controllability methodology to socioeconomic contexts, we provide guiding principles to choose the appropriate methodology and suggest new avenues for future theoretical work to encode more realistic communication assumptions.

[291]  arXiv:2105.04165 [pdf, other]
Title: Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
Comments: ACL 2021, 13 pages, 5 figures, project page: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Formal Languages and Automata Theory (cs.FL)

Geometry problem solving has attracted much attention in the NLP community recently. The task is challenging as it requires abstract problem understanding and symbolic reasoning with axiomatic knowledge. However, current datasets are either small in scale or not publicly available. Thus, we construct a new large-scale benchmark, Geometry3K, consisting of 3,002 geometry problems with dense annotation in formal language. We further propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (Inter-GPS). Inter-GPS first parses the problem text and diagram into formal language automatically via rule-based text parsing and neural object detecting, respectively. Unlike implicit learning in existing methods, Inter-GPS incorporates theorem knowledge as conditional rules and performs symbolic reasoning step by step. A theorem predictor is also designed to infer the theorem application sequence fed to the symbolic solver for the more efficient and reasonable searching path. Extensive experiments on the Geometry3K and GEOS datasets demonstrate Inter-GPS achieves significant improvements over existing methods.

[292]  arXiv:2105.04166 [pdf, other]
Title: Few-Shot Conversational Dense Retrieval
Comments: Accepted by SIGIR 2021
Subjects: Information Retrieval (cs.IR)

Dense retrieval (DR) has the potential to resolve the query understanding challenge in conversational search by matching in the learned embedding space. However, this adaptation is challenging due to DR models' extra needs for supervision signals and the long-tail nature of conversational search. In this paper, we present a Conversational Dense Retrieval system, ConvDR, that learns contextualized embeddings for multi-turn conversational queries and retrieves documents solely using embedding dot products. In addition, we grant ConvDR few-shot ability using a teacher-student framework, where we employ an ad hoc dense retriever as the teacher, inherit its document encodings, and learn a student query encoder to mimic the teacher embeddings on oracle reformulated queries. Our experiments on TREC CAsT and OR-QuAC demonstrate ConvDR's effectiveness in both few-shot and fully-supervised settings. It outperforms previous systems that operate in the sparse word space, matches the retrieval accuracy of oracle query reformulations, and is also more efficient thanks to its simplicity. Our analyses reveal that the advantages of ConvDR come from its ability to capture informative context while ignoring the unrelated context in previous conversation rounds. This makes ConvDR more effective as conversations evolve while previous systems may get confused by the increased noise from previous turns. Our code is publicly available at https://github.com/thunlp/ConvDR.

[293]  arXiv:2105.04169 [pdf, other]
Title: PillarSegNet: Pillar-based Semantic Grid Map Estimation using Sparse LiDAR Data
Comments: Accepted to present in the 2021 IEEE Intelligent Vehicles Symposium (IV21)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Semantic understanding of the surrounding environment is essential for automated vehicles. The recent publication of the SemanticKITTI dataset stimulates the research on semantic segmentation of LiDAR point clouds in urban scenarios. While most existing approaches predict sparse pointwise semantic classes for the sparse input LiDAR scan, we propose PillarSegNet to be able to output a dense semantic grid map. In contrast to a previously proposed grid map method, PillarSegNet uses PointNet to learn features directly from the 3D point cloud and then conducts 2D semantic segmentation in the top view. To train and evaluate our approach, we use both sparse and dense ground truth, where the dense ground truth is obtained from multiple superimposed scans. Experimental results on the SemanticKITTI dataset show that PillarSegNet achieves a performance gain of about 10% mIoU over the state-of-the-art grid map method.

[294]  arXiv:2105.04170 [pdf, other]
Title: AutoDebias: Learning to Debias for Recommendation
Comments: Accepted by SIGIR 2021
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)

Recommender systems rely on user behavior data like ratings and clicks to build personalization model. However, the collected data is observational rather than experimental, causing various biases in the data which significantly affect the learned model. Most existing work for recommendation debiasing, such as the inverse propensity scoring and imputation approaches, focuses on one or two specific biases, lacking the universal capacity that can account for mixed or even unknown biases in the data.
Towards this research gap, we first analyze the origin of biases from the perspective of \textit{risk discrepancy} that represents the difference between the expectation empirical risk and the true risk. Remarkably, we derive a general learning framework that well summarizes most existing debiasing strategies by specifying some parameters of the general framework. This provides a valuable opportunity to develop a universal solution for debiasing, e.g., by learning the debiasing parameters from data. However, the training data lacks important signal of how the data is biased and what the unbiased data looks like. To move this idea forward, we propose \textit{AotoDebias} that leverages another (small) set of uniform data to optimize the debiasing parameters by solving the bi-level optimization problem with meta-learning. Through theoretical analyses, we derive the generalization bound for AutoDebias and prove its ability to acquire the appropriate debiasing strategy. Extensive experiments on two real datasets and a simulated dataset demonstrated effectiveness of AutoDebias. The code is available at \url{https://github.com/DongHande/AutoDebias}.

[295]  arXiv:2105.04176 [pdf, other]
Title: HyperLTL Satisfiability is $Σ_1^1$-complete, HyperCTL* Satisfiability is $Σ_1^2$-complete
Subjects: Logic in Computer Science (cs.LO)

Temporal logics for the specification of information-flow properties are able to express relations between multiple executions of a system. The two most important such logics are HyperLTL and HyperCTL*, which generalise LTL and CTL* by trace quantification. It is known that this expressiveness comes at a price, i.e. satisfiability is undecidable for both logics.
In this paper we settle the exact complexity of these problems, showing that both are in fact highly undecidable: we prove that HyperLTL satisfiability is $\Sigma_1^1$-complete and HyperCTL* satisfiability is $\Sigma_1^2$-complete. These are significant increases over the previously known lower bounds and the first upper bounds. To prove $\Sigma_1^2$-membership for HyperCTL*, we prove that every satisfiable HyperCTL* sentence has a model that is equinumerous to the continuum, the first upper bound of this kind. We prove this bound to be tight. Finally, we show that the membership problem for every level of the HyperLTL quantifier alternation hierarchy is $\Pi_1^1$-complete.

[296]  arXiv:2105.04180 [pdf, other]
Title: Rate-Distortion Analysis of Minimum Excess Risk in Bayesian Learning
Comments: Accepted at ICML 2021
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)

Minimum Excess Risk (MER) in Bayesian learning is defined as the difference between the minimum expected loss achievable when learning from data and the minimum expected loss that could be achieved if the underlying parameter $W$ was observed. In this paper, we build upon and extend the recent results of (Xu & Raginsky, 2020) to analyze the MER in Bayesian learning and derive information-theoretic bounds on it. We formulate the problem as a (constrained) rate-distortion optimization and show how the solution can be bounded above and below by two other rate-distortion functions that are easier to study. The lower bound represents the minimum possible excess risk achievable by \emph{any} process using $R$ bits of information from the parameter $W$. For the upper bound, the optimization is further constrained to use $R$ bits from the training set, a setting which relates MER to information-theoretic bounds on the generalization gap in frequentist learning. We derive information-theoretic bounds on the difference between these upper and lower bounds and show that they can provide order-wise tight rates for MER. This analysis gives more insight into the information-theoretic nature of Bayesian learning as well as providing novel bounds.

[297]  arXiv:2105.04181 [pdf, other]
Title: KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation
Comments: 7 pages, 4 figures, accepted to IJCAI 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Knowledge distillation (KD) has recently emerged as an efficacious scheme for learning compact deep neural networks (DNNs). Despite the promising results achieved, the rationale that interprets the behavior of KD has yet remained largely understudied. In this paper, we introduce a novel task-oriented attention model, termed as KDExplainer, to shed light on the working mechanism underlying the vanilla KD. At the heart of KDExplainer is a Hierarchical Mixture of Experts (HME), in which a multi-class classification is reformulated as a multi-task binary one. Through distilling knowledge from a free-form pre-trained DNN to KDExplainer, we observe that KD implicitly modulates the knowledge conflicts between different subtasks, and in reality has much more to offer than label smoothing. Based on such findings, we further introduce a portable tool, dubbed as virtual attention module (VAM), that can be seamlessly integrated with various DNNs to enhance their performance under KD. Experimental results demonstrate that with a negligible additional cost, student models equipped with VAM consistently outperform their non-VAM counterparts across different benchmarks. Furthermore, when combined with other KD methods, VAM remains competent in promoting results, even though it is only motivated by vanilla KD.

[298]  arXiv:2105.04183 [pdf, other]
Title: UGRec: Modeling Directed and Undirected Relations for Recommendation
Comments: Accepted as a long paper in SIGIR 2021
Subjects: Information Retrieval (cs.IR)

Recommender systems, which merely leverage user-item interactions for user preference prediction (such as the collaborative filtering-based ones), often face dramatic performance degradation when the interactions of users or items are insufficient. In recent years, various types of side information have been explored to alleviate this problem. Among them, knowledge graph (KG) has attracted extensive research interests as it can encode users/items and their associated attributes in the graph structure to preserve the relation information. In contrast, less attention has been paid to the item-item co-occurrence information (i.e., \textit{co-view}), which contains rich item-item similarity information. It provides information from a perspective different from the user/item-attribute graph and is also valuable for the CF recommendation models. In this work, we make an effort to study the potential of integrating both types of side information (i.e., KG and item-item co-occurrence data) for recommendation. To achieve the goal, we propose a unified graph-based recommendation model (UGRec), which integrates the traditional directed relations in KG and the undirected item-item co-occurrence relations simultaneously. In particular, for a directed relation, we transform the head and tail entities into the corresponding relation space to model their relation; and for an undirected co-occurrence relation, we project head and tail entities into a unique hyperplane in the entity space to minimize their distance. In addition, a head-tail relation-aware attentive mechanism is designed for fine-grained relation modeling. Extensive experiments have been conducted on several publicly accessible datasets to evaluate the proposed model. Results show that our model outperforms several previous state-of-the-art methods and demonstrate the effectiveness of our UGRec model.

[299]  arXiv:2105.04184 [pdf, other]
Title: Generative Adversarial Networks (GANs) in Networking: A Comprehensive Survey & Evaluation
Comments: Accepted for publication at Journal of Computer Networks
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

Despite the recency of their conception, Generative Adversarial Networks (GANs) constitute an extensively researched machine learning sub-field for the creation of synthetic data through deep generative modeling. GANs have consequently been applied in a number of domains, most notably computer vision, in which they are typically used to generate or transform synthetic images. Given their relative ease of use, it is therefore natural that researchers in the field of networking (which has seen extensive application of deep learning methods) should take an interest in GAN-based approaches. The need for a comprehensive survey of such activity is therefore urgent. In this paper, we demonstrate how this branch of machine learning can benefit multiple aspects of computer and communication networks, including mobile networks, network analysis, internet of things, physical layer, and cybersecurity. In doing so, we shall provide a novel evaluation framework for comparing the performance of different models in non-image applications, applying this to a number of reference network datasets.

[300]  arXiv:2105.04187 [pdf, other]
Title: A Rigorous Information-Theoretic Definition of Redundancy and Relevancy in Feature Selection Based on (Partial) Information Decomposition
Comments: 36 pages, 9 figures
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)

Selecting a minimal feature set that is maximally informative about a target variable is a central task in machine learning and statistics. Information theory provides a powerful framework for formulating feature selection algorithms -- yet, a rigorous, information-theoretic definition of feature relevancy, which accounts for feature interactions such as redundant and synergistic contributions, is still missing. We argue that this lack is inherent to classical information theory which does not provide measures to decompose the information a set of variables provides about a target into unique, redundant, and synergistic contributions. Such a decomposition has been introduced only recently by the partial information decomposition (PID) framework. Using PID, we clarify why feature selection is a conceptually difficult problem when approached using information theory and provide a novel definition of feature relevancy and redundancy in PID terms. From this definition, we show that the conditional mutual information (CMI) maximizes relevancy while minimizing redundancy and propose an iterative, CMI-based algorithm for practical feature selection. We demonstrate the power of our CMI-based algorithm in comparison to the unconditional mutual information on benchmark examples and provide corresponding PID estimates to highlight how PID allows to quantify information contribution of features and their interactions in feature-selection problems.

[301]  arXiv:2105.04194 [pdf, other]
Title: The Modulo Radon Transform: Theory, Algorithms and Applications
Comments: 32 pages, submitted for possible publication
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)

Recently, experiments have been reported where researchers were able to perform high dynamic range (HDR) tomography in a heuristic fashion, by fusing multiple tomographic projections. This approach to HDR tomography has been inspired by HDR photography and inherits the same disadvantages. Taking a computational imaging approach to the HDR tomography problem, we here suggest a new model based on the Modulo Radon Transform (MRT), which we rigorously introduce and analyze. By harnessing a joint design between hardware and algorithms, we present a single-shot HDR tomography approach, which to our knowledge, is the only approach that is backed by mathematical guarantees.
On the hardware front, instead of recording the Radon Transform projections that my potentially saturate, we propose to measure modulo values of the same. This ensures that the HDR measurements are folded into a lower dynamic range. On the algorithmic front, our recovery algorithms reconstruct the HDR images from folded measurements. Beyond mathematical aspects such as injectivity and inversion of the MRT for different scenarios including band-limited and approximately compactly supported images, we also provide a first proof-of-concept demonstration. To do so, we implement MRT by experimentally folding tomographic measurements available as an open source data set using our custom designed modulo hardware. Our reconstruction clearly shows the advantages of our approach for experimental data. In this way, our MRT based solution paves a path for HDR acquisition in a number of related imaging problems.

[302]  arXiv:2105.04201 [pdf, other]
Title: REPT: Bridging Language Models and Machine Reading Comprehensionvia Retrieval-Based Pre-training
Comments: Findings of ACL 2021
Subjects: Computation and Language (cs.CL)

Pre-trained Language Models (PLMs) have achieved great success on Machine Reading Comprehension (MRC) over the past few years. Although the general language representation learned from large-scale corpora does benefit MRC, the poor support in evidence extraction which requires reasoning across multiple sentences hinders PLMs from further advancing MRC. To bridge the gap between general PLMs and MRC, we present REPT, a REtrieval-based Pre-Training approach. In particular, we introduce two self-supervised tasks to strengthen evidence extraction during pre-training, which is further inherited by downstream MRC tasks through the consistent retrieval operation and model architecture. To evaluate our proposed method, we conduct extensive experiments on five MRC datasets that require collecting evidence from and reasoning across multiple sentences. Experimental results demonstrate the effectiveness of our pre-training approach. Moreover, further analysis shows that our approach is able to enhance the capacity of evidence extraction without explicit supervision.

[303]  arXiv:2105.04206 [pdf, other]
Title: You Only Learn One Representation: Unified Network for Multiple Tasks
Subjects: Computer Vision and Pattern Recognition (cs.CV)

People ``understand'' the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These experiences learned through normal learning or subconsciously will be encoded and stored in the brain. Using these abundant experience as a huge database, human beings can effectively process data, even they were unseen beforehand. In this paper, we propose a unified network to encode implicit knowledge and explicit knowledge together, just like the human brain can learn knowledge from normal learning as well as subconsciousness learning. The unified network can generate a unified representation to simultaneously serve various tasks. We can perform kernel space alignment, prediction refinement, and multi-task learning in a convolutional neural network. The results demonstrate that when implicit knowledge is introduced into the neural network, it benefits the performance of all tasks. We further analyze the implicit representation learnt from the proposed unified network, and it shows great capability on catching the physical meaning of different tasks. The source code of this work is at : https://github.com/WongKinYiu/yolor.

[304]  arXiv:2105.04208 [pdf, other]
Title: Action Shuffling for Weakly Supervised Temporal Localization
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Weakly supervised action localization is a challenging task with extensive applications, which aims to identify actions and the corresponding temporal intervals with only video-level annotations available. This paper analyzes the order-sensitive and location-insensitive properties of actions, and embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance. To be specific, we propose a novel two-branch network architecture with intra/inter-action shuffling, referred to as ActShufNet. The intra-action shuffling branch lays out a self-supervised order prediction task to augment the video representation with inner-video relevance, whereas the inter-action shuffling branch imposes a reorganizing strategy on the existing action contents to augment the training set without resorting to any external resources. Furthermore, the global-local adversarial training is presented to enhance the model's robustness to irrelevant noises. Extensive experiments are conducted on three benchmark datasets, and the results clearly demonstrate the efficacy of the proposed method.

[305]  arXiv:2105.04210 [pdf, other]
Title: Robust Graph Learning Under Wasserstein Uncertainty
Comments: 21 pages,9 figures
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Graphs are playing a crucial role in different fields since they are powerful tools to unveil intrinsic relationships among signals. In many scenarios, an accurate graph structure representing signals is not available at all and that motivates people to learn a reliable graph structure directly from observed signals. However, in real life, it is inevitable that there exists uncertainty in the observed signals due to noise measurements or limited observability, which causes a reduction in reliability of the learned graph. To this end, we propose a graph learning framework using Wasserstein distributionally robust optimization (WDRO) which handles uncertainty in data by defining an uncertainty set on distributions of the observed data. Specifically, two models are developed, one of which assumes all distributions in uncertainty set are Gaussian distributions and the other one has no prior distributional assumption. Instead of using interior point method directly, we propose two algorithms to solve the corresponding models and show that our algorithms are more time-saving. In addition, we also reformulate both two models into Semi-Definite Programming (SDP), and illustrate that they are intractable in the scenario of large-scale graph. Experiments on both synthetic and real world data are carried out to validate the proposed framework, which show that our scheme can learn a reliable graph in the context of uncertainty.

[306]  arXiv:2105.04212 [pdf, other]
Title: Efficient Error-Correcting-Code Mechanism for High-Throughput Memristive Processing-in-Memory
Comments: Accepted to 58th Design Automation Conference (DAC) 2021
Subjects: Hardware Architecture (cs.AR)

Inefficient data transfer between computation and memory inspired emerging processing-in-memory (PIM) technologies. Many PIM solutions enable storage and processing using memristors in a crossbar-array structure, with techniques such as memristor-aided logic (MAGIC) used for computation. This approach provides highly-paralleled logic computation with minimal data movement. However, memristors are vulnerable to soft errors and standard error-correcting-code (ECC) techniques are difficult to implement without moving data outside the memory. We propose a novel technique for efficient ECC implementation along diagonals to support reliable computation inside the memory without explicitly reading the data. Our evaluation demonstrates an improvement of over eight orders of magnitude in reliability (mean time to failure) for an increase of about 26% in computation latency.

[307]  arXiv:2105.04213 [pdf, other]
Title: Temporal-Spatial Feature Pyramid for Video Saliency Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a 3D fully convolutional encoder-decoder architecture for video saliency detection, which combines scale, space and time information for video saliency modeling. The encoder extracts multi-scale temporal-spatial features from the input continuous video frames, and then constructs temporal-spatial feature pyramid through temporal-spatial convolution and top-down feature integration. The decoder performs hierarchical decoding of temporal-spatial features from different scales, and finally produces a saliency map from the integration of multiple video frames. Our model is simple yet effective, and can run in real time. We perform abundant experiments, and the results indicate that the well-designed structure can improve the precision of video saliency detection significantly. Experimental results on three purely visual video saliency benchmarks and six audio-video saliency benchmarks demonstrate that our method achieves state-of-theart performance.

[308]  arXiv:2105.04216 [pdf, other]
Title: Event-LSTM: An Unsupervised and Asynchronous Learning-based Representation for Event-based Data
Comments: 7 pages, 8 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Event cameras are activity-driven bio-inspired vision sensors, thereby resulting in advantages such as sparsity,high temporal resolution, low latency, and power consumption. Given the different sensing modality of event camera and high quality of conventional vision paradigm, event processing is predominantly solved by transforming the sparse and asynchronous events into 2D grid and subsequently applying standard vision pipelines. Despite the promising results displayed by supervised learning approaches in 2D grid generation, these approaches treat the task in supervised manner. Labeled task specific ground truth event data is challenging to acquire. To overcome this limitation, we propose Event-LSTM, an unsupervised Auto-Encoder architecture made up of LSTM layers as a promising alternative to learn 2D grid representation from event sequence. Compared to competing supervised approaches, ours is a task-agnostic approach ideally suited for the event domain, where task specific labeled data is scarce. We also tailor the proposed solution to exploit asynchronous nature of event stream, which gives it desirable charateristics such as speed invariant and energy-efficient 2D grid generation. Besides, we also push state-of-the-art event de-noising forward by introducing memory into the de-noising process. Evaluations on activity recognition and gesture recognition demonstrate that our approach yields improvement over state-of-the-art approaches, while providing the flexibilty to learn from unlabelled data.

[309]  arXiv:2105.04218 [pdf, other]
Title: Exploiting Elasticity in Tensor Ranks for Compressing Neural Networks
Comments: 8 pages, 5 figures
Subjects: Machine Learning (cs.LG)

Elasticities in depth, width, kernel size and resolution have been explored in compressing deep neural networks (DNNs). Recognizing that the kernels in a convolutional neural network (CNN) are 4-way tensors, we further exploit a new elasticity dimension along the input-output channels. Specifically, a novel nuclear-norm rank minimization factorization (NRMF) approach is proposed to dynamically and globally search for the reduced tensor ranks during training. Correlation between tensor ranks across multiple layers is revealed, and a graceful tradeoff between model size and accuracy is obtained. Experiments then show the superiority of NRMF over the previous non-elastic variational Bayesian matrix factorization (VBMF) scheme.

[310]  arXiv:2105.04221 [pdf]
Title: Similarities between Arabic Dialects: Investigating Geographical Proximity
Subjects: Computation and Language (cs.CL)

The automatic classification of Arabic dialects is an ongoing research challenge, which has been explored in recent work that defines dialects based on increasingly limited geographic areas like cities and provinces. This paper focuses on a related yet relatively unexplored topic: the effects of the geographical proximity of cities located in Arab countries on their dialectical similarity. Our work is twofold, reliant on: 1) comparing the textual similarities between dialects using cosine similarity and 2) measuring the geographical distance between locations. We study MADAR and NADI, two established datasets with Arabic dialects from many cities and provinces. Our results indicate that cities located in different countries may in fact have more dialectical similarity than cities within the same country, depending on their geographical proximity. The correlation between dialectical similarity and city proximity suggests that cities that are closer together are more likely to share dialectical attributes, regardless of country borders. This nuance provides the potential for important advancements in Arabic dialect research because it indicates that a more granular approach to dialect classification is essential to understanding how to frame the problem of Arabic dialects identification.

[311]  arXiv:2105.04222 [pdf, other]
Title: Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue State Tracking
Comments: NAACL 2021
Subjects: Computation and Language (cs.CL)

Zero-shot cross-domain dialogue state tracking (DST) enables us to handle task-oriented dialogue in unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot description enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes dialogue context and slots with a pre-trained self-attentive encoder, and generates slot values in an auto-regressive manner. In addition, we incorporate Slot Type Informed Descriptions that capture the shared information across slots to facilitate cross-domain knowledge transfer. Experimental results on the MultiWOZ dataset show that our proposed method significantly improves existing state-of-the-art results in the zero-shot cross-domain setting.

[312]  arXiv:2105.04228 [pdf, ps, other]
Title: Exact asymptotic characterisation of running time for approximate gradient descent on random graphs
Subjects: Data Structures and Algorithms (cs.DS); Probability (math.PR)

In this work we study the time complexity for the search of local minima in random graphs whose vertices have i.i.d. cost values. We show that, for Erd\"os-R\'enyi graphs with connection probability given by $\lambda/n^\alpha$ (with $\lambda > 0$ and $0 < \alpha < 1$), a family of local algorithms that approximate a gradient descent find local minima faster than the full gradient descent. Furthermore, we find a probabilistic representation for the running time of these algorithms leading to asymptotic estimates of the mean running times.

[313]  arXiv:2105.04232 [pdf, other]
Title: De-homogenization using Convolutional Neural Networks
Comments: 28 pages, 16 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

This paper presents a deep learning-based de-homogenization method for structural compliance minimization. By using a convolutional neural network to parameterize the mapping from a set of lamination parameters on a coarse mesh to a one-scale design on a fine mesh, we avoid solving the least square problems associated with traditional de-homogenization approaches and save time correspondingly. To train the neural network, a two-step custom loss function has been developed which ensures a periodic output field that follows the local lamination orientations. A key feature of the proposed method is that the training is carried out without any use of or reference to the underlying structural optimization problem, which renders the proposed method robust and insensitive wrt. domain size, boundary conditions, and loading. A post-processing procedure utilizing a distance transform on the output field skeleton is used to project the desired lamination widths onto the output field while ensuring a predefined minimum length-scale and volume fraction. To demonstrate that the deep learning approach has excellent generalization properties, numerical examples are shown for several different load and boundary conditions. For an appropriate choice of parameters, the de-homogenized designs perform within $7-25\%$ of the homogenization-based solution at a fraction of the computational cost. With several options for further improvements, the scheme may provide the basis for future interactive high-resolution topology optimization.

[314]  arXiv:2105.04236 [pdf, other]
Title: SIRNN: A Math Library for Secure RNN Inference
Comments: IEEE Security and Privacy 2021
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Mathematical Software (cs.MS)

Complex machine learning (ML) inference algorithms like recurrent neural networks (RNNs) use standard functions from math libraries like exponentiation, sigmoid, tanh, and reciprocal of square root. Although prior work on secure 2-party inference provides specialized protocols for convolutional neural networks (CNNs), existing secure implementations of these math operators rely on generic 2-party computation (2PC) protocols that suffer from high communication. We provide new specialized 2PC protocols for math functions that crucially rely on lookup-tables and mixed-bitwidths to address this performance overhead; our protocols for math functions communicate up to 423x less data than prior work. Some of the mixed bitwidth operations used by our math implementations are (zero and signed) extensions, different forms of truncations, multiplication of operands of mixed-bitwidths, and digit decomposition (a generalization of bit decomposition to larger digits). For each of these primitive operations, we construct specialized 2PC protocols that are more communication efficient than generic 2PC, and can be of independent interest. Furthermore, our math implementations are numerically precise, which ensures that the secure implementations preserve model accuracy of cleartext. We build on top of our novel protocols to build SIRNN, a library for end-to-end secure 2-party DNN inference, that provides the first secure implementations of an RNN operating on time series sensor data, an RNN operating on speech data, and a state-of-the-art ML architecture that combines CNNs and RNNs for identifying all heads present in images. Our evaluation shows that SIRNN achieves up to three orders of magnitude of performance improvement when compared to inference of these models using an existing state-of-the-art 2PC framework.

[315]  arXiv:2105.04240 [pdf, other]
Title: A rigorous introduction for linear models
Authors: Jun Lu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This note is meant to provide an introduction to linear models and the theories behind them. Our goal is to give a rigorous introduction to the readers with prior exposure to ordinary least squares. In machine learning, the output is usually a nonlinear function of the input. Deep learning even aims to find a nonlinear dependence with many layers which require a large amount of computation. However, most of these algorithms build upon simple linear models. We then describe linear models from different views and find the properties and theories behind the models. The linear model is the main technique in regression problems and the primary tool for it is the least squares approximation which minimizes a sum of squared errors. This is a natural choice when we're interested in finding the regression function which minimizes the corresponding expected squared error. We first describe ordinary least squares from three different points of view upon which we disturb the model with random noise and Gaussian noise. By Gaussian noise, the model gives rise to the likelihood so that we introduce a maximum likelihood estimator. It also develops some distribution theories for it via this Gaussian disturbance. The distribution theory of least squares will help us answer various questions and introduce related applications. We then prove least squares is the best unbiased linear model in the sense of mean squared error and most importantly, it actually approaches the theoretical limit. We end up with linear models with the Bayesian approach and beyond.

[316]  arXiv:2105.04241 [pdf, other]
Title: ReadTwice: Reading Very Large Documents with Memories
Comments: To appear in the proceedings of NAACL 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Knowledge-intensive tasks such as question answering often require assimilating information from different sections of large inputs such as books or article collections. We propose ReadTwuce, a simple and effective technique that combines several strengths of prior approaches to model long-range dependencies with Transformers. The main idea is to read text in small segments, in parallel, summarizing each segment into a memory table to be used in a second read of the text. We show that the method outperforms models of comparable size on several question answering (QA) datasets and sets a new state of the art on the challenging NarrativeQA task, with questions about entire books. Source code and pre-trained checkpoints for ReadTwice can be found at https://goo.gle/research-readtwice.

[317]  arXiv:2105.04244 [pdf, other]
Title: Overcoming the Distance Estimation Bottleneck in Camera Trap Distance Sampling
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Biodiversity crisis is still accelerating. Estimating animal abundance is of critical importance to assess, for example, the consequences of land-use change and invasive species on species composition, or the effectiveness of conservation interventions. Camera trap distance sampling (CTDS) is a recently developed monitoring method providing reliable estimates of wildlife population density and abundance. However, in current applications of CTDS, the required camera-to-animal distance measurements are derived by laborious, manual and subjective estimation methods. To overcome this distance estimation bottleneck in CTDS, this study proposes a completely automatized workflow utilizing state-of-the-art methods of image processing and pattern recognition.

[318]  arXiv:2105.04246 [pdf, other]
Title: In-Hindsight Quantization Range Estimation for Quantized Training
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Quantization techniques applied to the inference of deep neural networks have enabled fast and efficient execution on resource-constraint devices. The success of quantization during inference has motivated the academic community to explore fully quantized training, i.e. quantizing back-propagation as well. However, effective gradient quantization is still an open problem. Gradients are unbounded and their distribution changes significantly during training, which leads to the need for dynamic quantization. As we show, dynamic quantization can lead to significant memory overhead and additional data traffic slowing down training. We propose a simple alternative to dynamic quantization, in-hindsight range estimation, that uses the quantization ranges estimated on previous iterations to quantize the present. Our approach enables fast static quantization of gradients and activations while requiring only minimal hardware support from the neural network accelerator to keep track of output statistics in an online fashion. It is intended as a drop-in replacement for estimating quantization ranges and can be used in conjunction with other advances in quantized training. We compare our method to existing methods for range estimation from the quantized training literature and demonstrate its effectiveness with a range of architectures, including MobileNetV2, on image classification benchmarks (Tiny ImageNet & ImageNet).

[319]  arXiv:2105.04247 [pdf, other]
Title: Expressivity of Parameterized and Data-driven Representations in Quality Diversity Search
Comments: For code for reproducing experiments, see this https URL
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We consider multi-solution optimization and generative models for the generation of diverse artifacts and the discovery of novel solutions. In cases where the domain's factors of variation are unknown or too complex to encode manually, generative models can provide a learned latent space to approximate these factors. When used as a search space, however, the range and diversity of possible outputs are limited to the expressivity and generative capabilities of the learned model. We compare the output diversity of a quality diversity evolutionary search performed in two different search spaces: 1) a predefined parameterized space and 2) the latent space of a variational autoencoder model. We find that the search on an explicit parametric encoding creates more diverse artifact sets than searching the latent space. A learned model is better at interpolating between known data points than at extrapolating or expanding towards unseen examples. We recommend using a generative model's latent space primarily to measure similarity between artifacts rather than for search and generation. Whenever a parametric encoding is obtainable, it should be preferred over a learned representation as it produces a higher diversity of solutions.

[320]  arXiv:2105.04249 [pdf, other]
Title: Accounting for Model Uncertainty in Algorithmic Discrimination
Comments: 12 pages, Accepted at AIES 2021
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

Traditional approaches to ensure group fairness in algorithmic decision making aim to equalize ``total'' error rates for different subgroups in the population. In contrast, we argue that the fairness approaches should instead focus only on equalizing errors arising due to model uncertainty (a.k.a epistemic uncertainty), caused due to lack of knowledge about the best model or due to lack of data. In other words, our proposal calls for ignoring the errors that occur due to uncertainty inherent in the data, i.e., aleatoric uncertainty. We draw a connection between predictive multiplicity and model uncertainty and argue that the techniques from predictive multiplicity could be used to identify errors made due to model uncertainty. We propose scalable convex proxies to come up with classifiers that exhibit predictive multiplicity and empirically show that our methods are comparable in performance and up to four orders of magnitude faster than the current state-of-the-art. We further propose methods to achieve our goal of equalizing group error rates arising due to model uncertainty in algorithmic decision making and demonstrate the effectiveness of these methods using synthetic and real-world datasets.

[321]  arXiv:2105.04250 [pdf, ps, other]
Title: Expressing and Exploiting the Common Subgoal Structure of Classical Planning Domains Using Sketches: Extended Version
Subjects: Artificial Intelligence (cs.AI)

Width-based planning methods exploit the use of conjunctive goals for decomposing problems into subproblems of low width. However, algorithms like SIW fail when the goal is not serializable. In this work, we address this limitation of SIW by using a simple but powerful language for expressing problem decompositions introduced recently by Bonet and Geffner, called policy sketches. A policy sketch R consists of a set of Boolean and numerical features and a set of sketch rules that express how the values of these features are supposed to change. Like general policies, policy sketches are domain general, but unlike policies, the changes captured by sketch rules do not need to be achieved in a single step. We show that many planning domains that cannot be solved by SIW are provably solvable in low polynomial time with the SIW_R algorithm, the version of SIW that employs user-provided policy sketches. Policy sketches are thus shown to be a powerful language for expressing domain-specific knowledge in a simple and compact way and a convenient alternative to languages such as HTNs or temporal logics. Furthermore, policy sketches make it easy to express general problem decompositions and prove key properties like their complexity and width.

[322]  arXiv:2105.04252 [pdf, other]
Title: An Analysis of Phenotypic Diversity in Multi-Solution Optimization
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

More and more, optimization methods are used to find diverse solution sets. We compare solution diversity in multi-objective optimization, multimodal optimization, and quality diversity in a simple domain. We show that multiobjective optimization does not always produce much diversity, multimodal optimization produces higher fitness solutions, and quality diversity is not sensitive to genetic neutrality and creates the most diverse set of solutions. An autoencoder is used to discover phenotypic features automatically, producing an even more diverse solution set with quality diversity. Finally, we make recommendations about when to use which approach.

[323]  arXiv:2105.04253 [pdf, other]
Title: Tilling of Constellations
Subjects: Information Theory (cs.IT)

Motivated by applications in reliable and secure communication, we address the problem of tiling (or partitioning) a finite constellation in $\mathbb{Z}_{2^L}^n$ by subsets, in the case that the constellation does not possess an abelian group structure. The property that we do require is that the constellation is generated by a linear code through an injective mapping. The intrinsic relation between the code and the constellation provides a sufficient condition for a tiling to exist. We also present a necessary condition. Inspired by a result in group theory, we discuss results on tiling for the particular case when the finer constellation is an abelian group as well.

[324]  arXiv:2105.04256 [pdf, other]
Title: Designing Air Flow with Surrogate-assisted Phenotypic Niching
Subjects: Neural and Evolutionary Computing (cs.NE); Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA)

In complex, expensive optimization domains we often narrowly focus on finding high performing solutions, instead of expanding our understanding of the domain itself. But what if we could quickly understand the complex behaviors that can emerge in said domains instead? We introduce surrogate-assisted phenotypic niching, a quality diversity algorithm which allows to discover a large, diverse set of behaviors by using computationally expensive phenotypic features. In this work we discover the types of air flow in a 2D fluid dynamics optimization problem. A fast GPU-based fluid dynamics solver is used in conjunction with surrogate models to accurately predict fluid characteristics from the shapes that produce the air flow. We show that these features can be modeled in a data-driven way while sampling to improve performance, rather than explicitly sampling to improve feature models. Our method can reduce the need to run an infeasibly large set of simulations while still being able to design a large diversity of air flows and the shapes that cause them. Discovering diversity of behaviors helps engineers to better understand expensive domains and their solutions.

[325]  arXiv:2105.04260 [pdf, other]
Title: EPICTWIN: An Electric Power Digital Twin for Cyber Security Testing, Research and Education
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

Cyber-Physical Systems (CPS) rely on advanced communication and control technologies to efficiently manage devices and the flow of information in the system. However, a wide variety of potential security challenges has emerged due to the evolution of critical infrastructures (CI) from siloed sub-systems into connected and integrated networks. This is also the case for CI such as a smart grid. Smart grid security studies are carried out on physical test-beds to provide its users a platform to train and test cyber attacks, in a safe and controlled environment. However, it has limitations w.r.t modifying physical configuration and difficulty to scale.
To overcome these shortcomings, we built a digital power twin for a physical test-bed that is used for cyber security studies on smart grids. On the developed twin, the users can deploy real world attacks and countermeasures, to test and study its effectiveness. The difference from the physical test-bed is that its users may easily modify their power system components and configurations. Further, reproducing the twin for using and advancing the research is significantly cheaper. The developed twin has advanced features compared to any equivalent system in the literature. To illustrate a typical use case, we present a case study where a cyber attack is launched and discuss its implications.

[326]  arXiv:2105.04261 [pdf, other]
Title: Neuroscience-inspired perception-action in robotics: applying active inference for state estimation, control and self-perception
Comments: Accepted at ICLR 2021 Brain2AI workshop
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

Unlike robots, humans learn, adapt and perceive their bodies by interacting with the world. Discovering how the brain represents the body and generates actions is of major importance for robotics and artificial intelligence. Here we discuss how neuroscience findings open up opportunities to improve current estimation and control algorithms in robotics. In particular, how active inference, a mathematical formulation of how the brain resists a natural tendency to disorder, provides a unified recipe to potentially solve some of the major challenges in robotics, such as adaptation, robustness, flexibility, generalization and safe interaction. This paper summarizes some experiments and lessons learned from developing such a computational model on real embodied platforms, i.e., humanoid and industrial robots. Finally, we showcase the limitations and challenges that we are still facing to give robots human-like perception

[327]  arXiv:2105.04264 [pdf, other]
Title: Threat Landscape for Smart Grid Systems
Journal-ref: 15th International Conference on Availability, Reliability and Security (ARES 2020)
Subjects: Cryptography and Security (cs.CR)

Smart Grids are energy delivery networks, constituting an evolution of power grids, in which a bidirectional flow between power providers and consumers is established. These flows support the transfer of electricity and information, in order to support automation actions in the context of the energy delivery network. Insofar, many smart grid implementations and implementation proposals have emerged, with varying degrees of feature delivery and sophistication. While smart grids offer many advantages, their distributed nature and information flow streams between energy producers and consumers enable the launching of a number of attacks against the smart grid infrastructure, where the related consequences may range from economic loss to complete failure of the smart grid. In this paper, we survey the threat landscape of smart grids, identifying threats that are specific to this infrastructure, providing an assessment of the severity of the consequences of each attack type, discerning features that can be utilized to detect attacks and listing methods that can be used to mitigate them.

[328]  arXiv:2105.04266 [pdf, other]
Title: A Probabilistic Approach to Personalize Type-based Facet Ranking for POI Suggestion
Comments: Accepted at ICWE 2021
Subjects: Information Retrieval (cs.IR)

Faceted Search Systems (FSS) have become one of the main search interfaces used in vertical search systems, offering users meaningful facets to refine their search query and narrow down the results quickly to find the intended search target. This work focuses on the problem of ranking type-based facets. In a structured information space, type-based facets (t-facets) indicate the category to which each object belongs. When they belong to a large multi-level taxonomy, it is desirable to rank them separately before ranking other facet groups. This helps the searcher in filtering the results according to their type first. This also makes it easier to rank the rest of the facets once the type of the intended search target is selected. Existing research employs the same ranking methods for different facet groups. In this research, we propose a two-step approach to personalize t-facet ranking. The first step assigns a relevance score to each individual leaf-node t-facet. The score is generated using probabilistic models and it reflects t-facet relevance to the query and the user profile. In the second step, this score is used to re-order and select the sub-tree to present to the user. We investigate the usefulness of the proposed method to a Point Of Interest (POI) suggestion task. Our evaluation aims at capturing the user effort required to fulfil her search needs by using the ranked facets. The proposed approach achieved better results than other existing personalized baselines.

[329]  arXiv:2105.04271 [pdf, other]
Title: DocOIE: A Document-level Context-Aware Dataset for OpenIE
Comments: Paper to be appearred at Findings of ACL 2021
Subjects: Computation and Language (cs.CL)

Open Information Extraction (OpenIE) aims to extract structured relational tuples (subject, relation, object) from sentences and plays critical roles for many downstream NLP applications. Existing solutions perform extraction at sentence level, without referring to any additional contextual information. In reality, however, a sentence typically exists as part of a document rather than standalone; we often need to access relevant contextual information around the sentence before we can accurately interpret it. As there is no document-level context-aware OpenIE dataset available, we manually annotate 800 sentences from 80 documents in two domains (Healthcare and Transportation) to form a DocOIE dataset for evaluation. In addition, we propose DocIE, a novel document-level context-aware OpenIE model. Our experimental results based on DocIE demonstrate that incorporating document-level context is helpful in improving OpenIE performance. Both DocOIE dataset and DocIE model are released for public.

[330]  arXiv:2105.04272 [pdf, other]
Title: Advanced Metering Infrastructures: Security Risks and Mitigation
Journal-ref: 15th International Conference on Availability, Reliability and Security (ARES 2020)
Subjects: Cryptography and Security (cs.CR)

Energy providers are moving to the smart meter era, encouraging consumers to install, free of charge, these devices in their homes, automating consumption readings submission and making consumers life easier. However, the increased deployment of such smart devices brings a lot of security and privacy risks. In order to overcome such risks, Intrusion Detection Systems are presented as pertinent tools that can provide network-level protection for smart devices deployed in home environments. In this context, this paper is exploring the problems of Advanced Metering Infrastructures (AMI) and proposing a novel Machine Learning (ML) Intrusion Prevention System (IPS) to get optimal decisions based on a variety of factors and graphical security models able to tackle zero-day attacks.

[331]  arXiv:2105.04273 [pdf, other]
Title: Loss-Aversively Fair Classification
Comments: 8 pages, Accepted at AIES 2019
Journal-ref: In AAAI/ACM Conference on AI, Ethics, and Society (AIES 2019), January 27-28 2019 Honolulu, HI, USA
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

The use of algorithmic (learning-based) decision making in scenarios that affect human lives has motivated a number of recent studies to investigate such decision making systems for potential unfairness, such as discrimination against subjects based on their sensitive features like gender or race. However, when judging the fairness of a newly designed decision making system, these studies have overlooked an important influence on people's perceptions of fairness, which is how the new algorithm changes the status quo, i.e., decisions of the existing decision making system. Motivated by extensive literature in behavioral economics and behavioral psychology (prospect theory), we propose a notion of fair updates that we refer to as loss-averse updates. Loss-averse updates constrain the updates to yield improved (more beneficial) outcomes to subjects compared to the status quo. We propose tractable proxy measures that would allow this notion to be incorporated in the training of a variety of linear and non-linear classifiers. We show how our proxy measures can be combined with existing measures for training nondiscriminatory classifiers. Our evaluation using synthetic and real-world datasets demonstrates that the proposed proxy measures are effective for their desired tasks.

[332]  arXiv:2105.04274 [pdf, other]
Title: Compound Channel Capacities under Energy Constraints and Application
Comments: 6 pages, 1 figure, accepted at ISIT - 2021 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)

Compound channel models offer a simple and straightforward way of analyzing the stability of decoder design under model variations. With this work we provide a coding theorem for a large class of practically relevant compound channel models. We give explicit formulas for the cases of the Gaussian classical-quantum compound channels with unknown noise, unknown phase and unknown attenuation. We show analytically how the classical compound channel capacity formula motivates nontrivial choices of the displacement parameter of the Kennedy receiver. Our work demonstrates the value of the compound channel model as a method for the design of receivers in quantum communication.

[333]  arXiv:2105.04278 [pdf, other]
Title: A Rate-Distortion Framework for Characterizing Semantic Information
Comments: To appear at ISIT 2021, with an appendix added to include general solution for jointly Gaussian models
Subjects: Information Theory (cs.IT); Applications (stat.AP)

A rate-distortion problem motivated by the consideration of semantic information is formulated and solved. The starting point is to model an information source as a pair consisting of an intrinsic state which is not observable, corresponding to the semantic aspect of the source, and an extrinsic observation which is subject to lossy source coding. The proposed rate-distortion problem seeks a description of the information source, via encoding the extrinsic observation, under two distortion constraints, one for the intrinsic state and the other for the extrinsic observation. The corresponding state-observation rate-distortion function is obtained, and a few case studies of Gaussian intrinsic state estimation and binary intrinsic state classification are studied.

[334]  arXiv:2105.04281 [pdf, other]
Title: Visual Grounding with Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a transformer based approach for visual grounding. Unlike previous proposal-and-rank frameworks that rely heavily on pretrained object detectors or proposal-free frameworks that upgrade an off-the-shelf one-stage detector by fusing textual embeddings, our approach is built on top of a transformer encoder-decoder and is independent of any pretrained detectors or word embedding models. Termed VGTR -- Visual Grounding with TRansformers, our approach is designed to learn semantic-discriminative visual features under the guidance of the textual description without harming their location ability. This information flow enables our VGTR to have a strong capability in capturing context-level semantics of both vision and language modalities, rendering us to aggregate accurate visual clues implied by the description to locate the interested object instance. Experiments show that our method outperforms state-of-the-art proposal-free approaches by a considerable margin on five benchmarks while maintaining fast inference speed.

[335]  arXiv:2105.04284 [pdf, ps, other]
Title: A class of power maps with boomerang uniformity four
Comments: 9 pages
Subjects: Information Theory (cs.IT)

We give a class of power maps with boomerang uniformity four. Moreover, we compute the differential uniformity of this class of power maps and determine its complete differential spectrum. As a consequence, we show that for this class of power maps, the differential uniformity is strictly greater than its boomerang uniformity, contrary to popular belief.

[336]  arXiv:2105.04286 [pdf, other]
Title: Primitive Representation Learning for Scene Text Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Scene text recognition is a challenging task due to diverse variations of text instances in natural scene images. Conventional methods based on CNN-RNN-CTC or encoder-decoder with attention mechanism may not fully investigate stable and efficient feature representations for multi-oriented scene texts. In this paper, we propose a primitive representation learning method that aims to exploit intrinsic representations of scene text images. We model elements in feature maps as the nodes of an undirected graph. A pooling aggregator and a weighted aggregator are proposed to learn primitive representations, which are transformed into high-level visual text representations by graph convolutional networks. A Primitive REpresentation learning Network (PREN) is constructed to use the visual text representations for parallel decoding. Furthermore, by integrating visual text representations into an encoder-decoder model with the 2D attention mechanism, we propose a framework called PREN2D to alleviate the misalignment problem in attention-based methods. Experimental results on both English and Chinese scene text recognition tasks demonstrate that PREN keeps a balance between accuracy and efficiency, while PREN2D achieves state-of-the-art performance.

[337]  arXiv:2105.04289 [pdf, other]
Title: Do Concept Bottleneck Models Learn as Intended?
Comments: Accepted at ICLR 2021 Workshop on Responsible AI
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle to meet these goals. Using post hoc interpretability methods, we demonstrate that concepts do not correspond to anything semantically meaningful in input space, thus calling into question the usefulness of concept bottleneck models in their current form.

[338]  arXiv:2105.04293 [pdf, other]
Title: An interactive dashboard for searching and comparing soccer performance scores
Comments: 4 pages, 6 figures
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

The performance of soccer players is one of most discussed aspects by many actors in the soccer industry: from supporters to journalists, from coaches to talent scouts. Unfortunately, the dashboards available online provide no effective way to compare the evolution of the performance of players or to find players behaving similarly on the field. This paper describes the design of a web dashboard that interacts via APIs with a performance evaluation algorithm and provides graphical tools that allow the user to perform many tasks, such as to search or compare players by age, role or trend of growth in their performance, find similar players based on their pitching behavior, change the algorithm's parameters to obtain customized performance scores. We also describe an example of how a talent scout can interact with the dashboard to find young, promising talents.

[339]  arXiv:2105.04294 [pdf, other]
Title: Toward asynchronous EEG-based BCI: Detecting imagined words segments in continuous EEG signals
Comments: 10 pages, 14 figures
Journal-ref: Biomedical Signal Processing and Control. Volume 65 (2021), 102351
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Signal Processing (eess.SP)

An asynchronous Brain--Computer Interface (BCI) based on imagined speech is a tool that allows to control an external device or to emit a message at the moment the user desires to by decoding EEG signals of imagined speech. In order to correctly implement these types of BCI, we must be able to detect from a continuous signal, when the subject starts to imagine words. In this work, five methods of feature extraction based on wavelet decomposition, empirical mode decomposition, frequency energies, fractal dimension and chaos theory features are presented to solve the task of detecting imagined words segments from continuous EEG signals as a preliminary study for a latter implementation of an asynchronous BCI based on imagined speech. These methods are tested in three datasets using four different classifiers and the higher F1 scores obtained are 0.73, 0.79, and 0.68 for each dataset, respectively. This results are promising to build a system that automatizes the segmentation of imagined words segments for latter classification.

[340]  arXiv:2105.04295 [pdf, other]
Title: PyPlutchik: visualising and comparing emotion-annotated corpora
Comments: 18 pages, 13 figures. Submitted to IEEE for possible publication; copyright may change
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)

The increasing availability of textual corpora and data fetched from social networks is fuelling a huge production of works based on the model proposed by psychologist Robert Plutchik, often referred simply as the ``Plutchik Wheel''. Related researches range from annotation tasks description to emotions detection tools. Visualisation of such emotions is traditionally carried out using the most popular layouts, as bar plots or tables, which are however sub-optimal. The classic representation of the Plutchik's wheel follows the principles of proximity and opposition between pairs of emotions: spatial proximity in this model is also a semantic proximity, as adjacent emotions elicit a complex emotion (a primary dyad) when triggered together; spatial opposition is a semantic opposition as well, as positive emotions are opposite to negative emotions. The most common layouts fail to preserve both features, not to mention the need of visually allowing comparisons between different corpora in a blink of an eye, that is hard with basic design solutions. We introduce PyPlutchik, a Python library specifically designed for the visualisation of Plutchik's emotions in texts or in corpora. PyPlutchik draws the Plutchik's flower with each emotion petal sized after how much that emotion is detected or annotated in the corpus, also representing three degrees of intensity for each of them. Notably, PyPlutchik allows users to display also primary, secondary, tertiary and opposite dyads in a compact, intuitive way. We substantiate our claim that PyPlutchik outperforms other classic visualisations when displaying Plutchik emotions and we showcase a few examples that display our library's most compelling features.

[341]  arXiv:2105.04297 [pdf, other]
Title: How could Neural Networks understand Programs?
Journal-ref: ICML 2021
Subjects: Programming Languages (cs.PL); Machine Learning (cs.LG); Software Engineering (cs.SE)

Semantic understanding of programs is a fundamental problem for programming language processing (PLP). Recent works that learn representations of code based on pre-training techniques in NLP have pushed the frontiers in this direction. However, the semantics of PL and NL have essential differences. These being ignored, we believe it is difficult to build a model to better understand programs, by either directly applying off-the-shelf NLP pre-training techniques to the source code, or adding features to the model by the heuristic. In fact, the semantics of a program can be rigorously defined by formal semantics in PL theory. For example, the operational semantics, describes the meaning of a valid program as updating the environment (i.e., the memory address-value function) through fundamental operations, such as memory I/O and conditional branching. Inspired by this, we propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition, which is indispensable for program understanding. To validate our proposal, we present a hierarchical Transformer-based pre-training model called OSCAR to better facilitate the understanding of programs. OSCAR learns from intermediate representation (IR) and an encoded representation derived from static analysis, which are used for representing the fundamental operations and approximating the environment transitions respectively. OSCAR empirically shows the outstanding capability of program semantics understanding on many practical software engineering tasks.

[342]  arXiv:2105.04301 [pdf]
Title: ADASYN-Random Forest Based Intrusion Detection Model
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Intrusion detection has been a key topic in the field of cyber security, and the common network threats nowadays have the characteristics of varieties and variation. Considering the serious imbalance of intrusion detection datasets will result in low classification performance on attack behaviors of small sample size and difficulty to detect network attacks accurately and efficiently, using ADASYN oversampling method to balance datasets was proposed in this paper. In addition, random forest algorithm was used to train intrusion detection classifiers. Through the comparative experiment of Intrusion detection on CICIDS 2017 dataset, it is found that ADASYN with Random Forest performs better. Based on the experimental results, the improvement of precision, recall and F1 values after ADASYN is then analyzed. Experiments show that the proposed method can be applied to intrusion detection with large data, and can effectively improve the classification accuracy of network attack behaviors. Compared with traditional machine learning models, it has better performance, generalization ability and robustness.

[343]  arXiv:2105.04302 [pdf, other]
Title: Video Anomaly Detection By The Duality Of Normality-Granted Optical Flow
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video anomaly detection is a challenging task because of diverse abnormal events. To this task, methods based on reconstruction and prediction are wildly used in recent works, which are built on the assumption that learning on normal data, anomalies cannot be reconstructed or predicated as good as normal patterns, namely the anomaly result with more errors. In this paper, we propose to discriminate anomalies from normal ones by the duality of normality-granted optical flow, which is conducive to predict normal frames but adverse to abnormal frames. The normality-granted optical flow is predicted from a single frame, to keep the motion knowledge focused on normal patterns. Meanwhile, We extend the appearance-motion correspondence scheme from frame reconstruction to prediction, which not only helps to learn the knowledge about object appearances and correlated motion, but also meets the fact that motion is the transformation between appearances. We also introduce a margin loss to enhance the learning of frame prediction. Experiments on standard benchmark datasets demonstrate the impressive performance of our approach.

[344]  arXiv:2105.04308 [pdf, ps, other]
Title: Parallel Sandpiles or Spurious Bidirectional Icepiles?
Subjects: Formal Languages and Automata Theory (cs.FL)

In a recent paper E. Formenti and K. Perrot (FP) introduce a global rule assumed to describe the discrete time dynamics associated with a sandpile model under the parallel application of a suitable local rule acting on d dimensional lattices of cells equipped with uniform neighborhood. In this paper we submit this approach to a critical analysis, in the simplest elementary particular case of a one-dimensional lattice, which can be divided in two parts. In the first part we prove that the FP global rule does not describe the dynamics of standard sandpiles, but rather furnishes a description of the quite different situation of height difference between consecutive piles. This is a semantic uncorrect difference of interpretation. In the second part we investigate the consequences of the uncorrect FP assumption proving that their global rule describes a bidirectional spurious dynamics of icepiles (rather than sandpiles), in the sense that this latter is the consequence of application of three local rules: bidirectional vertical rule, bidirectional horizontal rule (typical of icepiles), and a granule jump from the bottom to the top (spurious rule of the dynamics).

[345]  arXiv:2105.04309 [pdf, other]
Title: Multi-modal Conditional Bounding Box Regression for Music Score Following
Comments: Accepted for publication in the Proceedings of the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

This paper addresses the problem of sheet-image-based on-line audio-to-score alignment also known as score following. Drawing inspiration from object detection, a conditional neural network architecture is proposed that directly predicts x,y coordinates of the matching positions in a complete score sheet image at each point in time for a given musical performance. Experiments are conducted on a synthetic polyphonic piano benchmark dataset and the new method is compared to several existing approaches from the literature for sheet-image-based score following as well as an Optical Music Recognition baseline. The proposed approach achieves new state-of-the-art results and furthermore significantly improves the alignment performance on a set of real-world piano recordings by applying Impulse Responses as a data augmentation technique.

[346]  arXiv:2105.04311 [pdf]
Title: Overcoming Complexity Catastrophe: An Algorithm for Beneficial Far-Reaching Adaptation under High Complexity
Comments: 10 pages, 5 Figures
Subjects: Neural and Evolutionary Computing (cs.NE); Adaptation and Self-Organizing Systems (nlin.AO)

In his seminal work with NK algorithms, Kauffman noted that fitness outcomes from algorithms navigating an NK landscape show a sharp decline at high complexity arising from pervasive interdependence among problem dimensions. This phenomenon - where complexity effects dominate (Darwinian) adaptation efforts - is called complexity catastrophe. We present an algorithm - incremental change taking turns (ICTT) - that finds distant configurations having fitness superior to that reported in extant research, under high complexity. Thus, complexity catastrophe is not inevitable: a series of incremental changes can lead to excellent outcomes.

[347]  arXiv:2105.04313 [pdf, other]
Title: DocReader: Bounding-Box Free Training of a Document Information Extraction Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Information extraction from documents is a ubiquitous first step in many business applications. During this step, the entries of various fields must first be read from the images of scanned documents before being further processed and inserted into the corresponding databases. While many different methods have been developed over the past years in order to automate the above extraction step, they all share the requirement of bounding-box or text segment annotations of their training documents. In this work we present DocReader, an end-to-end neural-network-based information extraction solution which can be trained using solely the images and the target values that need to be read. The DocReader can thus leverage existing historical extraction data, completely eliminating the need for any additional annotations beyond what is naturally available in existing human-operated service centres. We demonstrate that the DocReader can reach and surpass other methods which require bounding-boxes for training, as well as provide a clear path for continual learning during its deployment in production.

[348]  arXiv:2105.04319 [pdf, other]
Title: A Bregman Learning Framework for Sparse Neural Networks
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)

We propose a learning framework based on stochastic Bregman iterations to train sparse neural networks with an inverse scale space approach. We derive a baseline algorithm called LinBreg, an accelerated version using momentum, and AdaBreg, which is a Bregmanized generalization of the Adam algorithm. In contrast to established methods for sparse training the proposed family of algorithms constitutes a regrowth strategy for neural networks that is solely optimization-based without additional heuristics. Our Bregman learning framework starts the training with very few initial parameters, successively adding only significant ones to obtain a sparse and expressive network. The proposed approach is extremely easy and efficient, yet supported by the rich mathematical theory of inverse scale space methods. We derive a statistically profound sparse parameter initialization strategy and provide a rigorous stochastic convergence analysis of the loss decay and additional convergence proofs in the convex regime. Using only 3.4% of the parameters of ResNet-18 we achieve 90.2% test accuracy on CIFAR-10, compared to 93.6% using the dense network. Our algorithm also unveils an autoencoder architecture for a denoising task. The proposed framework also has a huge potential for integrating sparse backpropagation and resource-friendly training.

[349]  arXiv:2105.04322 [pdf, other]
Title: RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation
Comments: 11 pages, 5 figures, conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing online multiple object tracking (MOT) algorithms often consist of two subtasks, detection and re-identification (ReID). In order to enhance the inference speed and reduce the complexity, current methods commonly integrate these double subtasks into a unified framework. Nevertheless, detection and ReID demand diverse features. This issue would result in an optimization contradiction during the training procedure. With the target of alleviating this contradiction, we devise a module named Global Context Disentangling (GCD) that decouples the learned representation into detection-specific and ReID-specific embeddings. As such, this module provides an implicit manner to balance the different requirements of these two subtasks. Moreover, we observe that preceding MOT methods typically leverage local information to associate the detected targets and neglect to consider the global semantic relation. To resolve this restriction, we develop a module, referred to as Guided Transformer Encoder (GTE), by combining the powerful reasoning ability of Transformer encoder and deformable attention. Unlike previous works, GTE avoids analyzing all the pixels and only attends to capture the relation between query nodes and a few self-adaptively selected key samples. Therefore, it is computationally efficient. Extensive experiments have been conducted on the MOT16, MOT17 and MOT20 benchmarks to demonstrate the superiority of the proposed MOT framework, namely RelationTrack. The experimental results indicate that RelationTrack has surpassed preceding methods significantly and established a new state-of-the-art performance, e.g., IDF1 of 70.5% and MOTA of 67.2% on MOT20.

[350]  arXiv:2105.04324 [pdf, other]
Title: Passivity-based control of mechanical systems with linear damping identification
Comments: Submission for 7th IFAC Workshop on Lagrangian and Hamiltonian Methods for Nonlinear Control
Subjects: Systems and Control (eess.SY)

We propose a control approach for a class of nonlinear mechanical systems to stabilize the system under study while ensuring that the oscillations of the transient response are reduced. The approach is twofold: (i) we apply our technique for linear viscous damping identification of the system to improve the accuracy of the selected control technique, and (ii) we implement a passivity-based controller to stabilize and reduce the oscillations by selecting the control parameters properly in accordance with the identified damping. Moreover, we provide an analysis for a particular passivity-based control approach that has been shown successfully for reducing such oscillations. Also, we validate the methodology by implementing it experimentally in a planar manipulator.

[351]  arXiv:2105.04328 [pdf]
Title: An Autonomous Drone for Search and Rescue in Forests using Airborne Optical Sectioning
Comments: 21 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Drones will play an essential role in human-machine teaming in future search and rescue (SAR) missions. We present a first prototype that finds people fully autonomously in densely occluded forests. In the course of 17 field experiments conducted over various forest types and under different flying conditions, our drone found 38 out of 42 hidden persons; average precision was 86% for predefined flight paths, while adaptive path planning (where potential findings are double-checked) increased confidence by 15%. Image processing, classification, and dynamic flight-path adaptation are computed onboard in real-time and while flying. Our finding that deep-learning-based person classification is unaffected by sparse and error-prone sampling within one-dimensional synthetic apertures allows flights to be shortened and reduces recording requirements to one-tenth of the number of images needed for sampling using two-dimensional synthetic apertures. The goal of our adaptive path planning is to find people as reliably and quickly as possible, which is essential in time-critical applications, such as SAR. Our drone enables SAR operations in remote areas without stable network coverage, as it transmits to the rescue team only classification results that indicate detections and can thus operate with intermittent minimal-bandwidth connections (e.g., by satellite). Once received, these results can be visually enhanced for interpretation on remote mobile devices.

[352]  arXiv:2105.04332 [pdf, other]
Title: Bayesian Optimistic Optimisation with Exponentially Decaying Regret
Comments: To appear at ICML 2021 (21 pages)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Bayesian optimisation (BO) is a well-known efficient algorithm for finding the global optimum of expensive, black-box functions. The current practical BO algorithms have regret bounds ranging from $\mathcal{O}(\frac{logN}{\sqrt{N}})$ to $\mathcal O(e^{-\sqrt{N}})$, where $N$ is the number of evaluations. This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation which are based on partitioning the search space. We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $\mathcal O(N^{-\sqrt{N}})$ under the assumption that the objective function is sampled from a Gaussian process with a Mat\'ern kernel with smoothness parameter $\nu > 4 +\frac{D}{2}$, where $D$ is the number of dimensions. We perform experiments on optimisation of various synthetic functions and machine learning hyperparameter tuning tasks and show that our algorithm outperforms baselines.

[353]  arXiv:2105.04335 [pdf, other]
Title: Geometrical Characterization of Sensor Placement for Cone-Invariant and Multi-Agent Systems against Undetectable Zero-Dynamics Attacks
Comments: 8 figures
Subjects: Systems and Control (eess.SY)

Undetectable attacks are an important class of malicious attacks threatening the security of cyber-physical systems, which can modify a system's state but leave the system output measurements unaffected, and hence cannot be detected from the output. This paper studies undetectable attacks on cone-invariant systems and multi-agent systems. We first provide a general characterization of zero-dynamics attacks, which characterizes fully undetectable attacks targeting the non-minimum phase zeros of a system. This geometrical characterization makes it possible to develop a defense strategy seeking to place a minimal number of sensors to detect and counter the zero-dynamics attacks on the system's actuators. The detect and defense scheme amounts to computing a set containing potentially vulnerable actuator locations and nodes, and a defense union for feasible placement of sensors based on the geometrical properties of the cones under consideration.

[354]  arXiv:2105.04339 [pdf, other]
Title: DefSent: Sentence Embeddings using Definition Sentences
Subjects: Computation and Language (cs.CL)

Sentence embedding methods using natural language inference (NLI) datasets have been successfully applied to various tasks. However, these methods are only available for limited languages due to relying heavily on the large NLI datasets. In this paper, we propose DefSent, a sentence embedding method that uses definition sentences from a word dictionary. Since dictionaries are available for many languages, DefSent is more broadly applicable than methods using NLI datasets without constructing additional datasets. We demonstrate that DefSent performs comparably on unsupervised semantics textual similarity (STS) tasks and slightly better on SentEval tasks to the methods using large NLI datasets.

[355]  arXiv:2105.04340 [pdf]
Title: Interaction Theory of Hazard-Target System
Comments: 28 pages, 9 figures, 3 tables
Subjects: Systems and Control (eess.SY)

Major accidents (e.g., the Space Shuttle Challenger disaster in the USA, the Bhopal Disaster in India, Fukushima nuclear accident in Japan, Tianjin Port fire and explosion accident in China) have occurred all over the world. Safety scientists are always trying to understand why these accidents happened and how to prevent these accidents. Accident models and theories form the basis for many safety research fields and practices such as investigation of accidents, design of a safer system and decision making on safety related field. There is no universally accepted model with useful elements relating to understanding accident causation, although many accident causation models exist. Based on STAMP and RMF, we proposed a new theory named the Interaction Theory of Hazard-Target System (ITHTS) that incorporate human, organisational and technological characteristics in the same framework. Accident analysis methods provide the necessary information to analysis the accident in a specific setting. In order to solve the issues that current accident analysis methods still face, we proposed a new systemic accident analysis method based on ITHTS and STPA. We choose Tianjin Port fire and explosion accident in China as a case study to demonstrate the viability of the Interaction Theory of Hazard-target System and the applicability of the new accident analysis method. It is concluded that ITHTS can explain the phenomena in safety practice and the new accident analysis method can be application in the explanation and analysis of major accident.

[356]  arXiv:2105.04342 [pdf, other]
Title: Exploring open-ended gameplay features with Micro RollerCoaster Tycoon
Comments: 8 pages, 10 figures, submitted to Foundations of Digital Games Conference 2021
Subjects: Artificial Intelligence (cs.AI)

This paper introduces MicroRCT, a novel open source simulator inspired by the theme park sandbox game RollerCoaster Tycoon. The goal in MicroRCT is to place rides and shops in an amusement park to maximize profit earned from park guests. Thus, the challenges for game AI include both selecting high-earning attractions and placing them in locations that are convenient to guests. In this paper, the MAP-Elites algorithm is used to generate a diversity of park layouts, exploring two theoretical questions about evolutionary algorithms and game design: 1) Is there a benefit to starting from a minimal starting point for evolution and complexifying incrementally? and 2) What are the effects of resource limitations on creativity and optimization? Results indicate that building from scratch with no costs results in the widest diversity of high-performing designs.

[357]  arXiv:2105.04349 [pdf, other]
Title: Generative Adversarial Registration for Improved Conditional Deformable Templates
Comments: 24 pages, 15 figures. Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Deformable templates are essential to large-scale medical image registration, segmentation, and population analysis. Current conventional and deep network-based methods for template construction use only regularized registration objectives and often yield templates with blurry and/or anatomically implausible appearance, confounding downstream biomedical interpretation. We reformulate deformable registration and conditional template estimation as an adversarial game wherein we encourage realism in the moved templates with a generative adversarial registration framework conditioned on flexible image covariates. The resulting templates exhibit significant gain in specificity to attributes such as age and disease, better fit underlying group-wise spatiotemporal trends, and achieve improved sharpness and centrality. These improvements enable more accurate population modeling with diverse covariates for standardized downstream analyses and easier anatomical delineation for structures of interest.

[358]  arXiv:2105.04351 [pdf, ps, other]
Title: Attacks on a Privacy-Preserving Publish-Subscribe System and a Ride-Hailing Service
Authors: Srinivas Vivek
Subjects: Cryptography and Security (cs.CR)

A privacy-preserving Context-Aware Publish-Subscribe System (CA-PSS) enables an intermediary (broker) to match the content from a publisher and the subscription by a subscriber based on the current context while preserving confidentiality of the subscriptions and notifications. While a privacy-preserving Ride-Hailing Service (RHS) enables an intermediary (service provider) to match a ride request with a taxi driver in a privacy-friendly manner. In this work, we attack a privacy-preserving CA-PSS proposed by Nabeel et al. (2013), where we show that any entity in the system including the broker can learn the confidential subscriptions of the subscribers. We also attack a privacy-preserving RHS called lpRide proposed by Yu et al. (2019), where we show that any rider/driver can efficiently recover the secret keys of all other riders and drivers. Also, we show that any rider/driver will be able to learn the location of any rider. The attacks are based on our cryptanalysis of the modified Paillier cryptosystem proposed by Nabeel et al. that forms a building block for both the above protocols.

[359]  arXiv:2105.04354 [pdf, other]
Title: AFINet: Attentive Feature Integration Networks for Image Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Convolutional Neural Networks (CNNs) have achieved tremendous success in a number of learning tasks including image classification. Recent advanced models in CNNs, such as ResNets, mainly focus on the skip connection to avoid gradient vanishing. DenseNet designs suggest creating additional bypasses to transfer features as an alternative strategy in network design. In this paper, we design Attentive Feature Integration (AFI) modules, which are widely applicable to most recent network architectures, leading to new architectures named AFI-Nets. AFI-Nets explicitly model the correlations among different levels of features and selectively transfer features with a little overhead.AFI-ResNet-152 obtains a 1.24% relative improvement on the ImageNet dataset while decreases the FLOPs by about 10% and the number of parameters by about 9.2% compared to ResNet-152.

[360]  arXiv:2105.04357 [pdf, ps, other]
Title: Agreement in the presence of disagreeing rational players: The Huntsman Protocol
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)

In this paper, a novel Byzantine consensus protocol among $n$ players is proposed for the partially synchronous model. In particular, by assuming that standard cryptography is unbreakable, and that $n>\max\bigl(\frac{3}{2}k+3t,2(k+t)\bigr)$, this protocol is an equilibrium where no coalition of $k$ rational players can coordinate to increase their expected utility regardless of the arbitrary behavior of up to $t$ Byzantine players. We show that a baiting strategy is necessary and sufficient to solve this, so-called rational agreement problem. First, we show that it is impossible to solve this rational agreement problem without implementing a baiting strategy, a strategy that rewards rational players for betraying its coalition, by exposing undeniable proofs of fraud. Second, we propose the Huntsman protocol that solves the rational agreement problem by building recent advances in the context of accountable Byzantine agreement in partial synchrony. This protocol finds applications in distributed ledgers where players are incentivized to steal assets by leading other players to a disagreement on two distinct decisions where they ``double spend''.

[361]  arXiv:2105.04358 [pdf, other]
Title: Dynamical low-rank approximation for Burgers' equation with uncertainty
Subjects: Numerical Analysis (math.NA)

Quantifying uncertainties in hyperbolic equations is a source of several challenges. First, the solution forms shocks leading to oscillatory behaviour in the numerical approximation of the solution. Second, the number of unknowns required for an effective discretization of the solution grows exponentially with the dimension of the uncertainties, yielding high computational costs and large memory requirements. An efficient representation of the solution via adequate basis functions permits to tackle these difficulties. The generalized polynomial chaos (gPC) polynomials allow such an efficient representation when the distribution of the uncertainties is known. These distributions are usually only available for input uncertainties such as initial conditions, therefore the efficiency of this ansatz can get lost during runtime. In this paper, we make use of the dynamical low-rank approximation (DLRA) to obtain a memory-wise efficient solution approximation on a lower dimensional manifold. We investigate the use of the matrix projector-splitting integrator and the unconventional integrator for dynamical low-rank approximation, deriving separate time evolution equations for the spatial and uncertain basis functions, respectively. This guarantees an efficient approximation of the solution even if the underlying probability distributions change over time. Furthermore, filters to mitigate the appearance of spurious oscillations are implemented, and a strategy to enforce boundary conditions is introduced. The proposed methodology is analyzed for Burgers' equation equipped with uncertain initial values represented by a two-dimensional random vector. The numerical results show a reduction of the memory requirements, and that the important characteristics of the original system are well captured.

[362]  arXiv:2105.04371 [pdf, other]
Title: Poolingformer: Long Document Modeling with Pooling Attention
Comments: Accepted by ICML 2021
Subjects: Computation and Language (cs.CL)

In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.

[363]  arXiv:2105.04373 [pdf, other]
Title: Combinatorial Multi-armed Bandits for Resource Allocation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the sequential resource allocation problem where a decision maker repeatedly allocates budgets between resources. Motivating examples include allocating limited computing time or wireless spectrum bands to multiple users (i.e., resources). At each timestep, the decision maker should distribute its available budgets among different resources to maximize the expected reward, or equivalently to minimize the cumulative regret. In doing so, the decision maker should learn the value of the resources allocated for each user from feedback on each user's received reward. For example, users may send messages of different urgency over wireless spectrum bands; the reward generated by allocating spectrum to a user then depends on the message's urgency. We assume each user's reward follows a random process that is initially unknown. We design combinatorial multi-armed bandit algorithms to solve this problem with discrete or continuous budgets. We prove the proposed algorithms achieve logarithmic regrets under semi-bandit feedback.

[364]  arXiv:2105.04376 [pdf, other]
Title: Recommendations for Item Set Completion: On the Semantics of Item Co-Occurrence With Data Sparsity, Input Size, and Input Modalities
Comments: arXiv admin note: text overlap with arXiv:1907.12366
Subjects: Information Retrieval (cs.IR)

We address the problem of recommending relevant items to a user in order to "complete" a partial set of items already known. We consider the two scenarios of citation and subject label recommendation, which resemble different semantics of item co-occurrence: relatedness for co-citations and diversity for subject labels. We assess the influence of the completeness of an already known partial item set on the recommender performance. We also investigate data sparsity through a pruning parameter and the influence of using additional metadata. As recommender models, we focus on different autoencoders, which are particularly suited for reconstructing missing items in a set. We extend autoencoders to exploit a multi-modal input of text and structured data. Our experiments on six real-world datasets show that supplying the partial item set as input is helpful when item co-occurrence resembles relatedness, while metadata are effective when co-occurrence implies diversity. This outcome means that the semantics of item co-occurrence is an important factor. The simple item co-occurrence model is a strong baseline for citation recommendation. However, autoencoders have the advantage to enable exploiting additional metadata besides the partial item set as input and achieve comparable performance. For the subject label recommendation task, the title is the most important attribute. Adding more input modalities sometimes even harms the result. In conclusion, it is crucial to consider the semantics of the item co-occurrence for the choice of an appropriate recommendation model and carefully decide which metadata to exploit.

[365]  arXiv:2105.04378 [pdf, ps, other]
Title: The Typical Non-Linear Code over Large Alphabets
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

We consider the problem of describing the typical (possibly) non-linear code of minimum distance bounded from below over a large alphabet. We concentrate on block codes with the Hamming metric and on subspace codes with the injection metric. In sharp contrast with the behavior of linear block codes, we show that the typical non-linear code in the Hamming metric of cardinality $q^{n-d+1}$ is far from having minimum distance $d$, i.e., from being MDS. We also give more precise results about the asymptotic proportion of block codes with good distance properties within the set of codes having a certain cardinality. We then establish the analogous results for subspace codes with the injection metric, showing also an application to the theory of partial spreads in finite geometry.

[366]  arXiv:2105.04380 [pdf, other]
Title: Forsage: Anatomy of a Smart-Contract Pyramid Scheme
Comments: 17 pages, 13 figures
Subjects: Cryptography and Security (cs.CR)

Pyramid schemes are investment scams in which top-level participants in a hierarchical network recruit and profit from an expanding base of defrauded newer participants. Pyramid schemes have existed for over a century, but there have been no in-depth studies of their dynamics and communities because of the opacity of participants' transactions.
In this paper, we present an empirical study of Forsage, a pyramid scheme implemented as a smart contract and at its peak one of the largest consumers of resources in Ethereum. As a smart contract, Forsage makes its (byte)code and all of its transactions visible on the blockchain. We take advantage of this unprecedented transparency to gain insight into the mechanics, impact on participants, and evolution of Forsage.
We quantify the (multi-million-dollar) gains of top-level participants as well as the losses of the vast majority (around 88%) of users. We analyze Forsage code both manually and using a purpose-built transaction simulator to uncover the complex mechanics of the scheme. Through complementary study of promotional videos and social media, we show how Forsage promoters have leveraged the unique features of smart contracts to lure users with false claims of trustworthiness and profitability, and how Forsage activity is concentrated within a small number of national communities.

[367]  arXiv:2105.04381 [pdf, other]
Title: Did I delete my cookies? Cookies respawning with browser fingerprinting
Subjects: Cryptography and Security (cs.CR)

Stateful and stateless web tracking gathered much attention in the last decade, however they were always measured separately. To the best of our knowledge, our study is the first to detect and measure cookie respawning with browser and machine fingerprinting. We develop a detection methodology that allows us to detect cookies dependency on browser and machine features. Our results show that 1,150 out of the top 30, 000 Alexa websites deploy this tracking mechanism. We further uncover how domains collaborate to respawn cookies through fingerprinting. We find out that this technique can be used to track users across websites even when third-party cookies are deprecated. Together with a legal scholar, we conclude that cookie respawning with browser fingerprinting lacks legal interpretation under the GDPR and the ePrivacy directive, but its use in practice may breach them, thus subjecting it to fines up to 20 million euro.

[368]  arXiv:2105.04382 [pdf]
Title: Numerical studies of CO$_2$ leakage remediation by micp-based plugging technology
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Microbially induced calcite precipitation (MICP) is a technology for sealing leakage paths to ensure the safe storage of CO$_2$ in geological formations. In this work we introduce a numerical simulator of MICP for field-scale studies. This simulator is implemented in the open porous media (OPM) framework. We compare the numerical results to simulations using an upgraded implementation of the mathematical model in the MATLAB reservoir simulation toolbox (MRST). Finally, we consider a 3D system consisting of two aquifers separated by caprock with a leakage path across the width of the reservoir. We study a strategy where microbial solution is injected only at the beginning of the treatment and subsequently either growth solution or cementation solution is injected for biofilm development or calcite precipitation. By applying this strategy, the numerical results show that the MICP technology could be used to seal these leakage paths.

[369]  arXiv:2105.04383 [pdf, other]
Title: A framework for the automation of testing computer vision systems
Comments: 4 pages, Submission version, Accepted at the 2nd ACM/IEEE International Conference on Automation of Software Test AST 2021
Subjects: Software Engineering (cs.SE); Computer Vision and Pattern Recognition (cs.CV)

Vision systems, i.e., systems that allow to detect and track objects in images, have gained substantial importance over the past decades. They are used in quality assurance applications, e.g., for finding surface defects in products during manufacturing, surveillance, but also automated driving, requiring reliable behavior. Interestingly, there is only little work on quality assurance and especially testing of vision systems in general. In this paper, we contribute to the area of testing vision software, and present a framework for the automated generation of tests for systems based on vision and image recognition. The framework makes use of existing libraries allowing to modify original images and to obtain similarities between the original and modified images. We show how such a framework can be used for testing a particular industrial application on identifying defects on riblet surfaces and present preliminary results from the image classification domain.

[370]  arXiv:2105.04385 [pdf, other]
Title: Identifying Overly Restrictive Matching Patterns in SMT-based Program Verifiers
Subjects: Programming Languages (cs.PL)

Universal quantifiers occur frequently in proof obligations produced by program verifiers, for instance, to axiomatize uninterpreted functions and to express properties of arrays. SMT-based verifiers typically reason about them via E-matching, an SMT algorithm that requires syntactic matching patterns to guide the quantifier instantiations. Devising good matching patterns is challenging. In particular, overly restrictive patterns may lead to spurious verification errors if the quantifiers needed for a proof are not instantiated; they may also conceal unsoundness caused by inconsistent axiomatizations. In this paper, we present the first technique that identifies and helps the users remedy the effects of overly restrictive matching patterns. We designed a novel algorithm to synthesize missing triggering terms required to complete a proof. Tool developers can use this information to refine their matching patterns and prevent similar verification errors, or to fix a detected unsoundness.

[371]  arXiv:2105.04387 [pdf, other]
Title: Recent Advances in Deep Learning-based Dialogue Systems
Comments: 75 pages, 19 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Dialogue systems are a popular Natural Language Processing (NLP) task as it is promising in real-life applications. It is also a complicated task since many NLP tasks deserving study are involved. As a result, a multitude of novel works on this task are carried out, and most of them are deep learning-based due to the outstanding performance. In this survey, we mainly focus on the deep learning-based dialogue systems. We comprehensively review state-of-the-art research outcomes in dialogue systems and analyze them from two angles: model type and system type. Specifically, from the angle of model type, we discuss the principles, characteristics, and applications of different models that are widely used in dialogue systems. This will help researchers acquaint these models and see how they are applied in state-of-the-art frameworks, which is rather helpful when designing a new dialogue system. From the angle of system type, we discuss task-oriented and open-domain dialogue systems as two streams of research, providing insight into the hot topics related. Furthermore, we comprehensively review the evaluation methods and datasets for dialogue systems to pave the way for future research. Finally, some possible research trends are identified based on the recent research outcomes. To the best of our knowledge, this survey is the most comprehensive and up-to-date one at present in the area of dialogue systems and dialogue-related tasks, extensively covering the popular frameworks, topics, and datasets.

[372]  arXiv:2105.04396 [pdf, other]
Title: Stability Constrained Mobile Manipulation Planning on Rough Terrain
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

This paper presents a framework that allows online dynamic-stability-constrained optimal trajectory planning of a mobile manipulator robot working on rough terrain. First, the kinematics model of a mobile manipulator robot, and the Zero Moment Point (ZMP) stability measure are presented as theoretical background. Then, a sampling-based quasi-static planning algorithm modified for stability guarantee and traction optimization in continuous dynamic motion is presented along with a mathematical proof. The robot's quasi-static path is then used as an initial guess to warm-start a nonlinear optimal control solver which may otherwise have difficulties finding a solution to the stability-constrained formulation efficiently. The performance and computational efficiency of the framework are demonstrated through an application to a simulated timber harvesting mobile manipulator machine working on varying terrain. The results demonstrate feasibility of online trajectory planning on varying terrain while satisfying the dynamic stability constraint.

[373]  arXiv:2105.04397 [pdf, other]
Title: Why Aren't Regular Expressions a Lingua Franca? An Empirical Study on the Re-use and Portability of Regular Expressions
Comments: ESEC/FSE 2019
Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)

This paper explores the extent to which regular expressions (regexes) are portable across programming languages. Many languages offer similar regex syntaxes, and it would be natural to assume that regexes can be ported across language boundaries. But can regexes be copy/pasted across language boundaries while retaining their semantic and performance characteristics?
In our survey of 158 professional software developers, most indicated that they re-use regexes across language boundaries and about half reported that they believe regexes are a universal language. We experimentally evaluated the riskiness of this practice using a novel regex corpus -- 537,806 regexes from 193,524 projects written in JavaScript, Java, PHP, Python, Ruby, Go, Perl, and Rust. Using our polyglot regex corpus, we explored the hitherto-unstudied regex portability problems: logic errors due to semantic differences, and security vulnerabilities due to performance differences.
We report that developers' belief in a regex lingua franca is understandable but unfounded. Though most regexes compile across language boundaries, 15% exhibit semantic differences across languages and 10% exhibit performance differences across languages. We explained these differences using regex documentation, and further illuminate our findings by investigating regex engine implementations. Along the way we found bugs in the regex engines of JavaScript-V8, Python, Ruby, and Rust, and potential semantic and performance regex bugs in thousands of modules.

[374]  arXiv:2105.04402 [pdf, other]
Title: AWCD: An Efficient Point Cloud Processing Approach via Wasserstein Curvature
Comments: 13 pages, 5 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce the adaptive Wasserstein curvature denoising (AWCD), an original processing approach for point cloud data. By collecting curvatures information from Wasserstein distance, AWCD consider more precise structures of data and preserves stability and effectiveness even for data with noise in high density. This paper contains some theoretical analysis about the Wasserstein curvature and the complete algorithm of AWCD. In addition, we design digital experiments to show the denoising effect of AWCD. According to comparison results, we present the advantages of AWCD against traditional algorithms.

[375]  arXiv:2105.04405 [pdf, other]
Title: A Critical Review of Information Bottleneck Theory and its Applications to Deep Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In the past decade, deep neural networks have seen unparalleled improvements that continue to impact every aspect of today's society. With the development of high performance GPUs and the availability of vast amounts of data, learning capabilities of ML systems have skyrocketed, going from classifying digits in a picture to beating world-champions in games with super-human performance. However, even as ML models continue to achieve new frontiers, their practical success has been hindered by the lack of a deep theoretical understanding of their inner workings. Fortunately, a known information-theoretic method called the information bottleneck theory has emerged as a promising approach to better understand the learning dynamics of neural networks. In principle, IB theory models learning as a trade-off between the compression of the data and the retainment of information. The goal of this survey is to provide a comprehensive review of IB theory covering it's information theoretic roots and the recently proposed applications to understand deep learning models.

[376]  arXiv:2105.04408 [pdf]
Title: The Challenges and Opportunities of Human-Centered AI for Trustworthy Robots and Autonomous Systems
Comments: 15 pages, 4 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

The trustworthiness of Robots and Autonomous Systems (RAS) has gained a prominent position on many research agendas towards fully autonomous systems. This research systematically explores, for the first time, the key facets of human-centered AI (HAI) for trustworthy RAS. In this article, five key properties of a trustworthy RAS initially have been identified. RAS must be (i) safe in any uncertain and dynamic surrounding environments; (ii) secure, thus protecting itself from any cyber-threats; (iii) healthy with fault tolerance; (iv) trusted and easy to use to allow effective human-machine interaction (HMI), and (v) compliant with the law and ethical expectations. Then, the challenges in implementing trustworthy autonomous system are analytically reviewed, in respects of the five key properties, and the roles of AI technologies have been explored to ensure the trustiness of RAS with respects to safety, security, health and HMI, while reflecting the requirements of ethics in the design of RAS. While applications of RAS have mainly focused on performance and productivity, the risks posed by advanced AI in RAS have not received sufficient scientific attention. Hence, a new acceptance model of RAS is provided, as a framework for requirements to human-centered AI and for implementing trustworthy RAS by design. This approach promotes human-level intelligence to augment human's capacity. while focusing on contributions to humanity.

[377]  arXiv:2105.04414 [pdf]
Title: Predicting Intensive Care Unit Length of Stay and Mortality Using Patient Vital Signs: Machine Learning Model Development and Validation
Comments: 23 Pages, 11 Figures, 13 Tables
Journal-ref: JMIR Med Inform 2021;9(5):e21347
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Patient monitoring is vital in all stages of care. We here report the development and validation of ICU length of stay and mortality prediction models. The models will be used in an intelligent ICU patient monitoring module of an Intelligent Remote Patient Monitoring (IRPM) framework that monitors the health status of patients, and generates timely alerts, maneuver guidance, or reports when adverse medical conditions are predicted. We utilized the publicly available Medical Information Mart for Intensive Care (MIMIC) database to extract ICU stay data for adult patients to build two prediction models: one for mortality prediction and another for ICU length of stay. For the mortality model, we applied six commonly used machine learning (ML) binary classification algorithms for predicting the discharge status (survived or not). For the length of stay model, we applied the same six ML algorithms for binary classification using the median patient population ICU stay of 2.64 days. For the regression-based classification, we used two ML algorithms for predicting the number of days. We built two variations of each prediction model: one using 12 baseline demographic and vital sign features, and the other based on our proposed quantiles approach, in which we use 21 extra features engineered from the baseline vital sign features, including their modified means, standard deviations, and quantile percentages. We could perform predictive modeling with minimal features while maintaining reasonable performance using the quantiles approach. The best accuracy achieved in the mortality model was approximately 89% using the random forest algorithm. The highest accuracy achieved in the length of stay model, based on the population median ICU stay (2.64 days), was approximately 65% using the random forest algorithm.

[378]  arXiv:2105.04419 [pdf, other]
Title: VDB-EDT: An Efficient Euclidean Distance Transform Algorithm Based on VDB Data Structure
Subjects: Robotics (cs.RO)

This paper presents a fundamental algorithm, called VDB-EDT, for Euclidean distance transform (EDT) based on the VDB data structure. The algorithm executes on grid maps and generates the corresponding distance field for recording distance information against obstacles, which forms the basis of numerous motion planning algorithms. The contributions of this work mainly lie in three folds. Firstly, we propose a novel algorithm that can facilitate distance transform procedures by optimizing the scheduling priorities of transform functions, which significantly improves the running speed of conventional EDT algorithms. Secondly, we for the first time introduce the memory-efficient VDB data structure, a customed B+ tree, to represent the distance field hierarchically. Benefiting from the special index and caching mechanism, VDB shows a fast (average \textit{O}(1)) random access speed, and thus is very suitable for the frequent neighbor-searching operations in EDT. Moreover, regarding the small scale of existing datasets, we release a large-scale dataset captured from subterranean environments to benchmark EDT algorithms. Extensive experiments on the released dataset and publicly available datasets show that VDB-EDT can reduce memory consumption by about 30%-85%, depending on the sparsity of the environment, while maintaining a competitive running speed with the fastest array-based implementation. The experiments also show that VDB-EDT can significantly outperform the state-of-the-art EDT algorithm in both runtime and memory efficiency, which strongly demonstrates the advantages of our proposed method. The released dataset and source code are available on https://github.com/zhudelong/VDB-EDT.

[379]  arXiv:2105.04421 [pdf, other]
Title: Trials and Tribulations of Developing Hybrid Quantum-Classical Microservices Systems
Authors: Javier Rojo (1), David Valencia (1), Javier Berrocal (1), Enrique Moguel (1), Jose Garcia-Alonso (1), Juan Manuel Murillo Rodriguez (1) ((1) University of Extremadura)
Comments: 11 pages, 7 figures, 2 tables
Subjects: Software Engineering (cs.SE)

Quantum computing holds great promise to solve to problems where classical computers cannot reach. To the point where it already arouses the interest of both scientific and industrial communities. Thus, it is expected that hybrid systems will start to appear where quantum software interacts with classical systems. Such coexistence can be fostered by service computing. Unfortunately, the way in which quantum code can be offered as a service still misses out on many of the potential benefits of service computing. This paper takes the traveling salesman problem, and tackles the challenge of giving it an implementation in the form of a quantum microservice. Then it is used to detect which of the benefits of service computing are lost in the process. The conclusions help to measure the distance between the current state of technology and the state that would be desirable in order to have a real quantum service engineering.

[380]  arXiv:2105.04425 [pdf, other]
Title: MTNet: A Multi-Task Neural Network for On-Field Calibration of Low-Cost Air Monitoring Sensors
Comments: 9 pages, 6 figures
Subjects: Machine Learning (cs.LG)

The advances of sensor technology enable people to monitor air quality through widely distributed low-cost sensors. However, measurements from these sensors usually encounter high biases and require a calibration step to reach an acceptable performance in down-streaming analytical tasks. Most existing calibration methods calibrate one type of sensor at a time, which we call single-task calibration. Despite the popularity of this single-task schema, it may neglect interactions among calibration tasks of different sensors, which encompass underlying information to promote calibration performance. In this paper, we propose a multi-task calibration network (MTNet) to calibrate multiple sensors (e.g., carbon monoxide and nitrogen oxide sensors) simultaneously, modeling the interactions among tasks. MTNet consists of a single shared module, and several task-specific modules. Specifically, in the shared module, we extend the multi-gate mixture-of-experts structure to harmonize the task conflicts and correlations among different tasks; in each task-specific module, we introduce a feature selection strategy to customize the input for the specific task. These improvements allow MTNet to learn interaction information shared across different tasks, and task-specific information for each calibration task as well. We evaluate MTNet on three real-world datasets and compare it with several established baselines. The experimental results demonstrate that MTNet achieves the state-of-the-art performance.

[381]  arXiv:2105.04430 [pdf]
Title: An Enhanced Randomly Initialized Convolutional Neural Network for Columnar Cactus Recognition in Unmanned Aerial Vehicle Imagery
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recently, Convolutional Neural Networks (CNNs) have made a great performance for remote sensing image classification. Plant recognition using CNNs is one of the active deep learning research topics due to its added-value in different related fields, especially environmental conservation and natural areas preservation. Automatic recognition of plants in protected areas helps in the surveillance process of these zones and ensures the sustainability of their ecosystems. In this work, we propose an Enhanced Randomly Initialized Convolutional Neural Network (ERI-CNN) for the recognition of columnar cactus, which is an endemic plant that exists in the Tehuac\'an-Cuicatl\'an Valley in southeastern Mexico. We used a public dataset created by a group of researchers that consists of more than 20000 remote sensing images. The experimental results confirm the effectiveness of the proposed model compared to other models reported in the literature like InceptionV3 and the modified LeNet-5 CNN. Our ERI-CNN provides 98% of accuracy, 97% of precision, 97% of recall, 97.5% as f1-score, and 0.056 loss.

[382]  arXiv:2105.04431 [pdf, other]
Title: Boosting Semi-Supervised Face Recognition with Noise Robustness
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Although deep face recognition benefits significantly from large-scale training data, a current bottleneck is the labelling cost. A feasible solution to this problem is semi-supervised learning, exploiting a small portion of labelled data and large amounts of unlabelled data. The major challenge, however, is the accumulated label errors through auto-labelling, compromising the training. This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling. Specifically, we introduce a multi-agent method, named GroupNet (GN), to endow our solution with the ability to identify the wrongly labelled samples and preserve the clean samples. We show that GN alone achieves the leading accuracy in traditional supervised face recognition even when the noisy labels take over 50\% of the training data. Further, we develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN. It starts with a small amount of labelled data and consequently conducts high-confidence labelling on a large amount of unlabelled data to boost further training. The more data is labelled by NRoLL, the higher confidence is with the label in the dataset. To evaluate the competitiveness of our method, we run NRoLL with a rough condition that only one-fifth of the labelled MSCeleb is available and the rest is used as unlabelled data. On a wide range of benchmarks, our method compares favorably against the state-of-the-art methods.

[383]  arXiv:2105.04432 [pdf, other]
Title: Explicit Rate-Optimal Streaming Codes with Smaller Field Size
Subjects: Information Theory (cs.IT)

Streaming codes are a class of packet-level erasure codes that ensure packet recovery over a sliding window channel which allows either a burst erasure of size $b$ or $a$ random erasures within any window of size $(\tau+1)$ time units, under a strict decoding-delay constraint $\tau$. The field size over which streaming codes are constructed is an important factor determining the complexity of implementation. The best known explicit rate-optimal streaming code requires a field size of $q^2$ where $q \ge \tau+b-a$ is a prime power. In this work, we present an explicit rate-optimal streaming code, for all possible $\{a,b,\tau\}$ parameters, over a field of size $q^2$ for prime power $q \ge \tau$. This is the smallest-known field size of a general explicit rate-optimal construction that covers all $\{a,b,\tau\}$ parameter sets. We achieve this by modifying the non-explicit code construction due to Krishnan et al. to make it explicit, without change in field size.

[384]  arXiv:2105.04443 [pdf, other]
Title: Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
Comments: Accepted by NAACL2021, 9 pages, 5 figures
Subjects: Computation and Language (cs.CL)

Grammatical Error Correction (GEC) aims to correct writing errors and help language learners improve their writing skills. However, existing GEC models tend to produce spurious corrections or fail to detect lots of errors. The quality estimation model is necessary to ensure learners get accurate GEC results and avoid misleading from poorly corrected sentences. Well-trained GEC models can generate several high-quality hypotheses through decoding, such as beam search, which provide valuable GEC evidence and can be used to evaluate GEC quality. However, existing models neglect the possible GEC evidence from different hypotheses. This paper presents the Neural Verification Network (VERNet) for GEC quality estimation with multiple hypotheses. VERNet establishes interactions among hypotheses with a reasoning graph and conducts two kinds of attention mechanisms to propagate GEC evidence to verify the quality of generated hypotheses. Our experiments on four GEC datasets show that VERNet achieves state-of-the-art grammatical error detection performance, achieves the best quality estimation results, and significantly improves GEC performance by reranking hypotheses. All data and source codes are available at https://github.com/thunlp/VERNet.

[385]  arXiv:2105.04444 [pdf, other]
Title: Continual Learning via Bit-Level Information Preserving
Comments: CVPR2021
Subjects: Machine Learning (cs.LG)

Continual learning tackles the setting of learning different tasks sequentially. Despite the lots of previous solutions, most of them still suffer significant forgetting or expensive memory cost. In this work, targeted at these problems, we first study the continual learning process through the lens of information theory and observe that forgetting of a model stems from the loss of \emph{information gain} on its parameters from the previous tasks when learning a new task. From this viewpoint, we then propose a novel continual learning approach called Bit-Level Information Preserving (BLIP) that preserves the information gain on model parameters through updating the parameters at the bit level, which can be conveniently implemented with parameter quantization. More specifically, BLIP first trains a neural network with weight quantization on the new incoming task and then estimates information gain on each parameter provided by the task data to determine the bits to be frozen to prevent forgetting. We conduct extensive experiments ranging from classification tasks to reinforcement learning tasks, and the results show that our method produces better or on par results comparing to previous state-of-the-arts. Indeed, BLIP achieves close to zero forgetting while only requiring constant memory overheads throughout continual learning.

[386]  arXiv:2105.04447 [pdf, other]
Title: SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

We propose a novel scene flow estimation approach to capture and infer 3D motions from point clouds. Estimating 3D motions for point clouds is challenging, since a point cloud is unordered and its density is significantly non-uniform. Such unstructured data poses difficulties in matching corresponding points between point clouds, leading to inaccurate flow estimation. We propose a novel architecture named Sparse Convolution-Transformer Network (SCTN) that equips the sparse convolution with the transformer. Specifically, by leveraging the sparse convolution, SCTN transfers irregular point cloud into locally consistent flow features for estimating continuous and consistent motions within an object/local object part. We further propose to explicitly learn point relations using a point transformer module, different from exiting methods. We show that the learned relation-based contextual information is rich and helpful for matching corresponding points, benefiting scene flow estimation. In addition, a novel loss function is proposed to adaptively encourage flow consistency according to feature similarity. Extensive experiments demonstrate that our proposed approach achieves a new state of the art in scene flow estimation. Our approach achieves an error of 0.038 and 0.037 (EPE3D) on FlyingThings3D and KITTI Scene Flow respectively, which significantly outperforms previous methods by large margins.

[387]  arXiv:2105.04449 [pdf, other]
Title: G-Tran: Making Distributed Graph Transactions Fast
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

Graph transaction processing raises many unique challenges such as random data access due to the irregularity of graph structures, low throughput and high abort rate due to the relatively large read/write sets in graph transactions. To address these challenges, we present G-Tran -- an RDMA-enabled distributed in-memory graph database with serializable and snapshot isolation support. First, we propose a graph-native data store to achieve good data locality and fast data access for transactional updates and queries. Second, G-Tran adopts a fully decentralized architecture that leverages RDMA to process distributed transactions with the MPP model, which can achieve high performance by utilizing all computing resources. In addition, we propose a new MV-OCC implementation with two optimizations to address the issue of large read/write sets in graph transactions. Extensive experiments show that G-Tran achieves competitive performance compared with other popular graph databases on benchmark workloads.

[388]  arXiv:2105.04452 [pdf]
Title: Who Gets What, According to Whom? An Analysis of Fairness Perceptions in Service Allocation
Comments: Accepted at AIES'21
Subjects: Computers and Society (cs.CY)

Algorithmic fairness research has traditionally been linked to the disciplines of philosophy, ethics, and economics, where notions of fairness are prescriptive and seek objectivity. Increasingly, however, scholars are turning to the study of what different people perceive to be fair, and how these perceptions can or should help to shape the design of machine learning, particularly in the policy realm. The present work experimentally explores five novel research questions at the intersection of the "Who," "What," and "How" of fairness perceptions. Specifically, we present the results of a multi-factor conjoint analysis study that quantifies the effects of the specific context in which a question is asked, the framing of the given question, and who is answering it. Our results broadly suggest that the "Who" and "What," at least, matter in ways that are 1) not easily explained by any one theoretical perspective, 2) have critical implications for how perceptions of fairness should be measured and/or integrated into algorithmic decision-making systems.

[389]  arXiv:2105.04453 [pdf, other]
Title: Neural Computation of Capacity Region of Memoryless Multiple Access Channels
Comments: 6 pages, 4 figures, accepted at ISIT2021
Subjects: Information Theory (cs.IT)

This paper provides a numerical framework for computing the achievable rate region of memoryless multiple access channel (MAC) with a continuous alphabet from data. In particular, we use recent results on variational lower bounds on mutual information and KL-divergence to compute the boundaries of the rate region of MAC using a set of functions parameterized by neural networks. Our method relies on a variational lower bound on KL-divergence and an upper bound on KL-divergence based on the f-divergence inequalities. Unlike previous work, which computes an estimate on mutual information, which is neither a lower nor an upper bound, our method estimates a lower bound on mutual information. Our numerical results show that the proposed method provides tighter estimates compared to the MINE-based estimator at large SNRs while being computationally more efficient. Finally, we apply the proposed method to the optical intensity MAC and obtain a new achievable rate boundary tighter than prior works.

[390]  arXiv:2105.04454 [pdf, other]
Title: Physical Fault Injection and Side-Channel Attacks on Mobile Devices: A Comprehensive Survey
Subjects: Cryptography and Security (cs.CR)

The past decade has seen the rapid deployment of mobile devices with densely packaged system-on-chips (SoCs) with multi-core, high-frequency CPUs and complex pipelines. In parallel, sophisticated SoC-assisted security mechanisms, such as trusted execution environments (TEEs), full-disk and file-based encryption, have also been deployed for protecting sensitive data. Both advancements have dramatically complicated the use of physical attacks, which has recently led to the development of specialised attack methods. In this survey, we consolidate recent developments in physical fault injections (FIAs) and side-channel attacks (SCAs) on modern mobile devices. In total, we comprehensively survey over 50 fault injection and side-channel attack papers published between 2009-2021. We evaluate the prevailing attack methods, compare existing attacks using a common framework, identify several challenges and shortcomings, and suggest future directions of research.

[391]  arXiv:2105.04456 [pdf, ps, other]
Title: A shape optimisation with the isogeometric boundary element method and adjoint variable method for the three-dimensional Helmholtz equation
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE)

This paper presents a shape optimisation system to design the shape of an acoustically-hard object in the three-dimensional open space. Boundary element method (BEM) is suitable to analyse such an exterior field. However, the conventional BEM, which is based on piecewise polynomial shape and interpolation functions, can require many design variables because they are usually chosen as a part of the nodes of the underlying boundary element mesh. In addition, it is not easy for the conventional method to compute the gradient of the sound pressure on the surface, which is necessary to compute the shape derivative of our interest, of a given object. To overcome these issues, we employ the isogeometric boundary element method (IGBEM), which was developed in our previous work. With using the IGBEM, we can design the shape of surfaces through control points of the NURBS surfaces of the target object. We integrate the IGBEM with the nonlinear programming software through the adjoint variable method (AVM), where the resulting adjoint boundary value problem can be also solved by the IGBEM with a slight modification. The numerical verification and demonstration validate our shape optimisation framework.

[392]  arXiv:2105.04458 [pdf, other]
Title: Learning Robust Latent Representations for Controllable Speech Synthesis
Comments: Accepted in ACL2021 Findings
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

State-of-the-art Variational Auto-Encoders (VAEs) for learning disentangled latent representations give impressive results in discovering features like pitch, pause duration, and accent in speech data, leading to highly controllable text-to-speech (TTS) synthesis. However, these LSTM-based VAEs fail to learn latent clusters of speaker attributes when trained on either limited or noisy datasets. Further, different latent variables start encoding the same features, limiting the control and expressiveness during speech synthesis. To resolve these issues, we propose RTI-VAE (Reordered Transformer with Information reduction VAE) where we minimize the mutual information between different latent variables and devise a modified Transformer architecture with layer reordering to learn controllable latent representations in speech data. We show that RTI-VAE reduces the cluster overlap of speaker attributes by at least 30\% over LSTM-VAE and by at least 7\% over vanilla Transformer-VAE.

[393]  arXiv:2105.04459 [pdf, other]
Title: ICON: Learning Regular Maps Through Inverse Consistency
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Learning maps between data samples is fundamental. Applications range from representation learning, image translation and generative modeling, to the estimation of spatial deformations. Such maps relate feature vectors, or map between feature spaces. Well-behaved maps should be regular, which can be imposed explicitly or may emanate from the data itself. We explore what induces regularity for spatial transformations, e.g., when computing image registrations. Classical optimization-based models compute maps between pairs of samples and rely on an appropriate regularizer for well-posedness. Recent deep learning approaches have attempted to avoid using such regularizers altogether by relying on the sample population instead. We explore if it is possible to obtain spatial regularity using an inverse consistency loss only and elucidate what explains map regularity in such a context. We find that deep networks combined with an inverse consistency loss and randomized off-grid interpolation yield well behaved, approximately diffeomorphic, spatial transformations. Despite the simplicity of this approach, our experiments present compelling evidence, on both synthetic and real data, that regular maps can be obtained without carefully tuned explicit regularizers and competitive registration performance.

[394]  arXiv:2105.04462 [pdf, other]
Title: Friend or Foe: A Review and Synthesis of Computational Models of the Identity Labeling Problem
Comments: Accepted at Journal of Mathematical Sociology
Subjects: Computers and Society (cs.CY)

We introduce the identity labeling problem - given an individual in a social situation, can we predict what identity(ies) they will be labeled with by someone else? This problem remains a theoretical gap and methodological challenge, evidenced by the fact that models of social-cognition often sidestep the issue by treating identities as already known. We build on insights from existing models to develop a new framework, entitled Latent Cognitive Social Spaces, that can incorporate multiple social cues including sentiment information, socio-demographic characteristics, and institutional associations to estimate the most culturally expected identity. We apply our model to data collected in two vignette experiments, finding that it predicts identity labeling choices of participants with a mean absolute error of 10.9%, a 100% improvement over previous models based on parallel constraint satisfaction and affect control theory.

[395]  arXiv:2105.04471 [pdf, other]
Title: Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Uncertainty awareness is crucial to develop reliable machine learning models. In this work, we propose the Natural Posterior Network (NatPN) for fast and high-quality uncertainty estimation for any task where the target distribution belongs to the exponential family. Thus, NatPN finds application for both classification and general regression settings. Unlike many previous approaches, NatPN does not require out-of-distribution (OOD) data at training time. Instead, it leverages Normalizing Flows to fit a single density on a learned low-dimensional and task-dependent latent space. For any input sample, NatPN uses the predicted likelihood to perform a Bayesian update over the target distribution. Theoretically, NatPN assigns high uncertainty far away from training data. Empirically, our extensive experiments on calibration and OOD detection show that NatPN delivers highly competitive performance for classification, regression and count prediction tasks.

[396]  arXiv:2105.04472 [pdf, other]
Title: Safety of the Intended Driving Behavior Using Rulebooks
Journal-ref: 2020 IEEE Intelligent Vehicles Symposium (IV)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Autonomous Vehicles (AVs) are complex systems that drive in uncertain environments and potentially navigate unforeseeable situations. Safety of these systems requires not only an absence of malfunctions but also high performance of functions in many different scenarios. The ISO/PAS 21448 [1] guidance recommends a process to ensure the Safety of the Intended Functionality (SOTIF) for road vehicles. This process starts with a functional specification that fully describes the intended functionality and further includes the verification and validation that the AV meets this specification. For the path planning function, defining the correct sequence of control actions for each vehicle in all potential driving situations is intractable. In this paper, the authors provide a link between the Rulebooks framework, presented by [2], and the SOTIF process. We establish that Rulebooks provide a functional description of the path planning task in an AV and discuss the potential usage of the method for verification and validation.

[397]  arXiv:2105.04475 [pdf, other]
Title: Self-Guided Curriculum Learning for Neural Machine Translation
Comments: Work in progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i.e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible. Inspired by this, we propose a self-guided curriculum strategy to encourage the learning of neural machine translation (NMT) models to follow the above recovery criterion, where we cast the recovery degree of each training example as its learning difficulty. Specifically, we adopt the sentence level BLEU score as the proxy of recovery degree. Different from existing curricula relying on linguistic prior knowledge or third-party language models, our chosen learning difficulty is more suitable to measure the degree of knowledge mastery of the NMT models. Experiments on translation benchmarks, including WMT14 English$\Rightarrow$German and WMT17 Chinese$\Rightarrow$English, demonstrate that our approach can consistently improve translation performance against strong baseline Transformer.

[398]  arXiv:2105.04484 [pdf, other]
Title: Towards Robust One-shot Task Execution using Knowledge Graph Embeddings
Comments: 7 pages, 3 figures. Accepted for publication at IEEE ICRA 2021
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Requiring multiple demonstrations of a task plan presents a burden to end-users of robots. However, robustly executing tasks plans from a single end-user demonstration is an ongoing challenge in robotics. We address the problem of one-shot task execution, in which a robot must generalize a single demonstration or prototypical example of a task plan to a new execution environment. Our approach integrates task plans with domain knowledge to infer task plan constituents for new execution environments. Our experimental evaluations show that our knowledge representation makes more relevant generalizations that result in significantly higher success rates over tested baselines. We validated the approach on a physical platform, which resulted in the successful generalization of initial task plans to 38 of 50 execution environments with errors resulting from autonomous robot operation included.

[399]  arXiv:2105.04485 [pdf, other]
Title: T-Cash: Transferable Fiat Backed Coins
Authors: Hitesh Tewari
Subjects: Cryptography and Security (cs.CR)

Numerous electronic cash schemes have been proposed over the years - however none have been embraced by financial institutions as an alternative to fiat currency. David Chaum's ecash scheme was the closest to something that mimicked a modern day currency system, with the important property that it provided anonymity for users when purchasing coins from a bank, and subsequently spending them at a merchant premises. However it lacked a crucial element present in current fiat-based systems - the ability to continuously spend or transfer coins. Bitcoin reignited the interest in cryptocurrencies in the last decade but is now seen as more of an asset store as opposed to a financial instrument. One interesting thing that has come out of the Bitcoin system is blockchains and the associated distributed consensus protocols. In this paper we propose a transferable electronic cash scheme using blockchain technology which allows users to continuously reuse coins within the system.

[400]  arXiv:2105.04486 [pdf, other]
Title: Probabilistic Top-k Dominating Queries in Distributed Uncertain Databases
Subjects: Databases (cs.DB)

In many real-world applications such as business planning and sensor data monitoring, one important, yet challenging, the task is to rank objects(e.g., products, documents, or spatial objects) based on their ranking scores and efficiently return those objects with the highest scores. In practice, due to the unreliability of data sources, many real-world objects often contain noises and are thus imprecise and uncertain. In this paper, we study the problem of probabilistic top-k dominating(PTD) query on such large-scale uncertain data in a distributed environment, which retrieves k uncertain objects from distributed uncertain databases(on multiple distributed servers), having the largest ranking scores with high confidences. In order to efficiently tackle the distributed PTD problem, we propose a MapReduce framework for processing distributed PTD queries over distributed uncertain databases. In this MapReduce framework, we design effective pruning strategies to filter out false alarms in the distributed setting, propose cost-model-based index distribution mechanisms over servers, and develop efficient distributed PTD query processing algorithms. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed distributed PTD approach on both real and synthetic data sets through various experimental settings.

[401]  arXiv:2105.04487 [pdf, ps, other]
Title: Tamper Detection against Unitary Operators
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT); Quantum Physics (quant-ph)

We consider (Enc, Dec) schemes which are used to encode a classical/quantum message $m$ and derive an $n$-qubit quantum codeword $\psi_m$. The quantum codeword $\psi_m$ can adversarially tamper via a unitary $U \in \mathcal{U}$ from some known tampering unitary family $\mathcal{U}$, resulting in $U \psi_m U^\dagger$.
Firstly, we initiate the general study of quantum tamper detection codes, which must detect that tampering occurred with high probability. In case there was no tampering, we would like to output the message $m$ with a probability of $1$. We show that quantum tamper detection codes exist for both classical messages and quantum messages for any family of unitaries $\mathcal{U}$, such that $|\mathcal{U}| < 2^{2^{\alpha n}}$ for some known constant $\alpha \in (0,1)$ and all the unitaries satisfy one additional condition :
\item Far from Identity : For each $U \in \mathcal{U}$, we require that its modulus of trace value isn't too much i.e. $ |Trace(U)| \leq \phi N$, where $N=2^n.$
Quantum tamper-detection codes are quantum generalizations of classical tamper detection codes studied by Jafargholi et al. \cite{JW15}.
Additionally for classical message $m$, if we must either output message $m$ or detect that tampering occurred and output $\perp$ with high probability, we show that it is possible without the restriction of Far from Identity condition for any family of unitaries $\mathcal{U}$, such that $|\mathcal{U} | < 2^{2^{\alpha n}}$. We also provide efficient (Enc, Dec) schemes when the family of tampering unitaries are from Pauli group $\mathcal{P}_n$, which can be thought of as a quantum version of the algebraic manipulation detection (AMD) codes of Cramer et al. \cite{CDFPW08}.

[402]  arXiv:2105.04488 [pdf, other]
Title: A Deep Reinforcement Learning Approach to Audio-Based Navigation in a Multi-Speaker Environment
Comments: To be published in ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that has received very little attention in the reinforcement learning literature. Our experiments show that the agent can successfully identify a particular target speaker among a set of $N$ predefined speakers in a room and move itself towards that speaker, while avoiding collision with other speakers or going outside the room boundaries. The agent is shown to be robust to speaker pitch shifting and it can learn to navigate the environment, even when a limited number of training utterances are available for each speaker.

[403]  arXiv:2105.04489 [pdf, other]
Title: Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
Comments: To appear at CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

When people observe events, they are able to abstract key information and build concise summaries of what is happening. These summaries include contextual and semantic information describing the important high-level details (what, where, who and how) of the observed event and exclude background information that is deemed unimportant to the observer. With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video. These descriptions can be captured in captions that provide expanded attributes for video labeling (e.g. actions/objects/scenes/sentiment/etc.) while allowing us to gain new insight into what people find important or necessary to summarize specific events. Existing caption datasets for video understanding are either small in scale or restricted to a specific domain. To address this, we present the Spoken Moments (S-MiT) dataset of 500k spoken captions each attributed to a unique short video depicting a broad range of different events. We collect our descriptions using audio recordings to ensure that they remain as natural and concise as possible while allowing us to scale the size of a large classification dataset. In order to utilize our proposed dataset, we present a novel Adaptive Mean Margin (AMM) approach to contrastive learning and evaluate our models on video/caption retrieval on multiple datasets. We show that our AMM approach consistently improves our results and that models trained on our Spoken Moments dataset generalize better than those trained on other video-caption datasets.

[404]  arXiv:2105.04493 [pdf, other]
Title: Graph Feature Gating Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Graph neural networks (GNNs) have received tremendous attention due to their power in learning effective representations for graphs. Most GNNs follow a message-passing scheme where the node representations are updated by aggregating and transforming the information from the neighborhood. Meanwhile, they adopt the same strategy in aggregating the information from different feature dimensions. However, suggested by social dimension theory and spectral embedding, there are potential benefits to treat the dimensions differently during the aggregation process. In this work, we investigate to enable heterogeneous contributions of feature dimensions in GNNs. In particular, we propose a general graph feature gating network (GFGN) based on the graph signal denoising problem and then correspondingly introduce three graph filters under GFGN to allow different levels of contributions from feature dimensions. Extensive experiments on various real-world datasets demonstrate the effectiveness and robustness of the proposed frameworks.

[405]  arXiv:2105.04501 [pdf, other]
Title: Incorrectness Logic for Graph Programs
Comments: Accepted by the 14th International Conference on Graph Transformation (ICGT 2021)
Subjects: Logic in Computer Science (cs.LO)

Program logics typically reason about an over-approximation of program behaviour to prove the absence of bugs. Recently, program logics have been proposed that instead prove the presence of bugs by means of under-approximate reasoning, which has the promise of better scalability. In this paper, we present an under-approximate program logic for a nondeterministic graph programming language, and show how it can be used to reason deductively about program incorrectness, whether defined by the presence of forbidden graph structure or by finitely failing executions. We prove this incorrectness logic to be sound and complete, and speculate on some possible future applications of it.

[406]  arXiv:2105.04505 [pdf, other]
Title: Towards Benchmarking the Utility of Explanations for Model Debugging
Comments: Short paper, to appear at TrustNLP @ NAACL 2021
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Post-hoc explanation methods are an important class of approaches that help understand the rationale underlying a trained model's decision. But how useful are they for an end-user towards accomplishing a given task? In this vision paper, we argue the need for a benchmark to facilitate evaluations of the utility of post-hoc explanation methods. As a first step to this end, we enumerate desirable properties that such a benchmark should possess for the task of debugging text classifiers. Additionally, we highlight that such a benchmark facilitates not only assessing the effectiveness of explanations but also their efficiency.

[407]  arXiv:2105.04508 [pdf, other]
Title: MDA-Net: Multi-Dimensional Attention-Based Neural Network for 3D Image Segmentation
Authors: Rutu Gandhi, Yi Hong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Segmenting an entire 3D image often has high computational complexity and requires large memory consumption; by contrast, performing volumetric segmentation in a slice-by-slice manner is efficient but does not fully leverage the 3D data. To address this challenge, we propose a multi-dimensional attention network (MDA-Net) to efficiently integrate slice-wise, spatial, and channel-wise attention into a U-Net based network, which results in high segmentation accuracy with a low computational cost. We evaluate our model on the MICCAI iSeg and IBSR datasets, and the experimental results demonstrate consistent improvements over existing methods.

[408]  arXiv:2105.04512 [pdf, other]
Title: UPC's Speech Translation System for IWSLT 2021
Comments: Submitted to IWSLT 2021
Subjects: Computation and Language (cs.CL)

This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text. Submitted systems can be either cascade or end-to-end and use a custom or given segmentation. Our submission is an end-to-end speech translation system, which combines pre-trained models (Wav2Vec 2.0 and mBART) with coupling modules between the encoder and decoder, and uses an efficient fine-tuning technique, which trains only 20% of its total parameters. We show that adding an Adapter to the system and pre-training it, can increase the convergence speed and the final result, with which we achieve a BLEU score of 27.3 on the MuST-C test set. Our final model is an ensemble that obtains 28.22 BLEU score on the same set. Our submission also uses a custom segmentation algorithm that employs pre-trained Wav2Vec 2.0 for identifying periods of untranscribable text and can bring improvements of 2.5 to 3 BLEU score on the IWSLT 2019 test set, as compared to the result with the given segmentation.

[409]  arXiv:2105.04515 [pdf, other]
Title: An end-to-end Optical Character Recognition approach for ultra-low-resolution printed text images
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Some historical and more recent printed documents have been scanned or stored at very low resolutions, such as 60 dpi. Though such scans are relatively easy for humans to read, they still present significant challenges for optical character recognition (OCR) systems. The current state-of-the art is to use super-resolution to reconstruct an approximation of the original high-resolution image and to feed this into a standard OCR system. Our novel end-to-end method bypasses the super-resolution step and produces better OCR results. This approach is inspired from our understanding of the human visual system, and builds on established neural networks for performing OCR.
Our experiments have shown that it is possible to perform OCR on 60 dpi scanned images of English text, which is a significantly lower resolution than the state-of-the-art, and we achieved a mean character level accuracy (CLA) of 99.7% and word level accuracy (WLA) of 98.9% across a set of about 1000 pages of 60 dpi text in a wide range of fonts. For 75 dpi images, the mean CLA was 99.9% and the mean WLA was 99.4% on the same sample of texts. We make our code and data (including a set of low-resolution images with their ground truths) publicly available as a benchmark for future work in this field.

[410]  arXiv:2105.04522 [pdf, other]
Title: Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We propose two novel loss functions based on Jensen-Shannon divergence for learning under label noise. Following the work of Ghosh et al. (2017), we argue about their theoretical robustness. Furthermore, we reveal several other desirable properties by drawing informative connections to various loss functions, e.g., cross entropy, mean absolute error, generalized cross entropy, symmetric cross entropy, label smoothing, and most importantly consistency regularization. We conduct extensive and systematic experiments using both synthetic (CIFAR) and real (WebVision) noise and demonstrate significant and consistent improvements over other loss functions. Also, we conduct several informative side experiments that highlight the different theoretical properties.

[411]  arXiv:2105.04524 [pdf, other]
Title: AP-side WLAN Analytics
Authors: Peshal Nayak
Subjects: Networking and Internet Architecture (cs.NI)

Monitoring the network performance experienced by the end-user is crucial for managers of wireless networks as it can enable them to remotely modify the network parameters to improve the end-user experience. Unfortunately, for performance monitoring, managers are typically limited to the logs of the Access Points (APs) that they manage. This information does not directly capture factors that can hinder station (STA) side transmissions. Consequently, state-of-the-art methods to measure such metrics primarily involve active measurements. Unfortunately, such active measurements increase traffic load and if used regularly and for all the STAs can potentially disrupt user traffic, thereby worsening performance for other users in the network and draining the battery of mobile devices.
This thesis enables passive AP-side network analytics. In the first part of the thesis, I present virtual speed test, a measurement based framework that enables an AP to estimate speed test results for any of its associated clients solely based on AP-side observables. Next, I present Uplink Latency Microscope (uScope), an AP-side framework for estimation of WLAN uplink latency for any of the associated STAs and decomposition into its constituent components. Similar to virtual speed test, uScope makes estimations solely based on passive AP-side observations. We implement both frameworks on a commodity hardware platform and conduct extensive field trials on a university campus and in a residential apartment complex. In over 1 million tests, the two proposed frameworks demonstrate an estimation accuracy with errors under 10%.

[412]  arXiv:2105.04528 [pdf, other]
Title: Accelerating Large Scale Real-Time GNN Inference using Channel Pruning
Subjects: Machine Learning (cs.LG)

Graph Neural Networks (GNNs) are proven to be powerful models to generate node embedding for downstream applications. However, due to the high computation complexity of GNN inference, it is hard to deploy GNNs for large-scale or real-time applications. In this paper, we propose to accelerate GNN inference by pruning the dimensions in each layer with negligible accuracy loss. Our pruning framework uses a novel LASSO regression formulation for GNNs to identify feature dimensions (channels) that have high influence on the output activation. We identify two inference scenarios and design pruning schemes based on their computation and memory usage for each. To further reduce the inference complexity, we effectively store and reuse hidden features of visited nodes, which significantly reduces the number of supporting nodes needed to compute the target embedding. We evaluate the proposed method with the node classification problem on five popular datasets and a real-time spam detection application. We demonstrate that the pruned GNN models greatly reduce computation and memory usage with little accuracy loss. For full inference, the proposed method achieves an average of 3.27x speedup with only 0.002 drop in F1-Micro on GPU. For batched inference, the proposed method achieves an average of 6.67x speedup with only 0.003 drop in F1-Micro on CPU. To the best of our knowledge, we are the first to accelerate large scale real-time GNN inference through channel pruning.

[413]  arXiv:2105.04529 [pdf, other]
Title: Identification of the nonlinear steering dynamics of an autonomous vehicle
Comments: Accepted to SYSID 2021 (revised with reviewer feedback)
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Automated driving applications require accurate vehicle specific models to precisely predict and control the motion dynamics. However, modern vehicles have a wide array of digital and mechatronic components that are difficult to model, manufactures do not disclose all details required for modelling and even existing models of subcomponents require coefficient estimation to match the specific characteristics of each vehicle and their change over time. Hence, it is attractive to use data-driven modelling to capture the relevant vehicle dynamics and synthesise model-based control solutions. In this paper, we address identification of the steering system of an autonomous car based on measured data. We show that the underlying dynamics are highly nonlinear and challenging to be captured, necessitating the use of data-driven methods that fuse the approximation capabilities of learning and the efficiency of dynamic system identification. We demonstrate that such a neural network based subspace-encoder method can successfully capture the underlying dynamics while other methods fall short to provide reliable results.

[414]  arXiv:2105.04534 [pdf, other]
Title: Improving Fairness of AI Systems with Lossless De-biasing
Comments: 8 pages, 19 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

In today's society, AI systems are increasingly used to make critical decisions such as credit scoring and patient triage. However, great convenience brought by AI systems comes with troubling prevalence of bias against underrepresented groups. Mitigating bias in AI systems to increase overall fairness has emerged as an important challenge. Existing studies on mitigating bias in AI systems focus on eliminating sensitive demographic information embedded in data. Given the temporal and contextual complexity of conceptualizing fairness, lossy treatment of demographic information may contribute to an unnecessary trade-off between accuracy and fairness, especially when demographic attributes and class labels are correlated. In this paper, we present an information-lossless de-biasing technique that targets the scarcity of data in the disadvantaged group. Unlike the existing work, we demonstrate, both theoretically and empirically, that oversampling underrepresented groups can not only mitigate algorithmic bias in AI systems that consistently predict a favorable outcome for a certain group, but improve overall accuracy by mitigating class imbalance within data that leads to a bias towards the majority class. We demonstrate the effectiveness of our technique on real datasets using a variety of fairness metrics.

[415]  arXiv:2105.04538 [pdf, other]
Title: Learning High-Dimensional Distributions with Latent Neural Fokker-Planck Kernels
Comments: code will be updated at this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Learning high-dimensional distributions is an important yet challenging problem in machine learning with applications in various domains. In this paper, we introduce new techniques to formulate the problem as solving Fokker-Planck equation in a lower-dimensional latent space, aiming to mitigate challenges in high-dimensional data space. Our proposed model consists of latent-distribution morphing, a generator and a parameterized Fokker-Planck kernel function. One fascinating property of our model is that it can be trained with arbitrary steps of latent distribution morphing or even without morphing, which makes it flexible and as efficient as Generative Adversarial Networks (GANs). Furthermore, this property also makes our latent-distribution morphing an efficient plug-and-play scheme, thus can be used to improve arbitrary GANs, and more interestingly, can effectively correct failure cases of the GAN models. Extensive experiments illustrate the advantages of our proposed method over existing models.

[416]  arXiv:2105.04544 [pdf, other]
Title: Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction
Subjects: Machine Learning (cs.LG)

We address the problem of causal effect estimation in the presence of unobserved confounding, but where proxies for the latent confounder(s) are observed. We propose two kernel-based methods for nonlinear causal effect estimation in this setting: (a) a two-stage regression approach, and (b) a maximum moment restriction approach. We focus on the proximal causal learning setting, but our methods can be used to solve a wider class of inverse problems characterised by a Fredholm integral equation. In particular, we provide a unifying view of two-stage and moment restriction approaches for solving this problem in a nonlinear setting. We provide consistency guarantees for each algorithm, and we demonstrate these approaches achieve competitive results on synthetic data and data simulating a real-world task. In particular, our approach outperforms earlier methods that are not suited to leveraging proxy variables.

[417]  arXiv:2105.04547 [pdf]
Title: Large-scale memory failure prediction using mcelog-based Data Mining and Machine Learning
Authors: Chengdong Yao
Comments: 11 pages, 2 figures, 1 table. Detailed solution will be open source to this https URL after the competition is over
Subjects: Databases (cs.DB); Machine Learning (cs.LG); Performance (cs.PF); Software Engineering (cs.SE)

In the data center, unexpected downtime caused by memory failures can lead to a decline in the stability of the server and even the entire information technology infrastructure, which harms the business. Therefore, whether the memory failure can be accurately predicted in advance has become one of the most important issues to be studied in the data center. However, for the memory failure prediction in the production system, it is necessary to solve technical problems such as huge data noise and extreme imbalance between positive and negative samples, and at the same time ensure the long-term stability of the algorithm. This paper compares and summarizes some commonly used skills and the improvement they can bring. The single model we proposed won the top 15th in the 2nd Alibaba Cloud AIOps Competition belonging to the 25th Pacific-Asia Conference on Knowledge Discovery and Data Mining.

[418]  arXiv:2105.04550 [pdf, other]
Title: Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC); Machine Learning (stat.ML)

Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.

[419]  arXiv:2105.04551 [pdf, other]
Title: Stochastic Image-to-Video Synthesis using cINNs
Comments: Accepted to CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video understanding calls for a model to learn the characteristic interplay between static scene content and its dynamics: Given an image, the model must be able to predict a future progression of the portrayed scene and, conversely, a video should be explained in terms of its static image content and all the remaining characteristics not present in the initial frame. This naturally suggests a bijective mapping between the video domain and the static content as well as residual information. In contrast to common stochastic image-to-video synthesis, such a model does not merely generate arbitrary videos progressing the initial image. Given this image, it rather provides a one-to-one mapping between the residual vectors and the video with stochastic outcomes when sampling. The approach is naturally implemented using a conditional invertible neural network (cINN) that can explain videos by independently modelling static and other video characteristics, thus laying the basis for controlled video synthesis. Experiments on four diverse video datasets demonstrate the effectiveness of our approach in terms of both the quality and diversity of the synthesized results. Our project page is available at https://bit.ly/3t66bnU.

[420]  arXiv:2105.04553 [pdf, other]
Title: Self-Supervised Learning with Swin Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We are witnessing a modeling shift from CNN to Transformers in computer vision. In this paper, we present a self-supervised learning approach called MoBY, with Vision Transformers as its backbone architecture. The approach is basically a combination of MoCo v2 and BYOL, tuned to achieve reasonably high accuracy on ImageNet-1K linear evaluation: 72.8% and 75.0% top-1 accuracy using DeiT-S and Swin-T, respectively, by 300-epoch training. The performance is slightly better than recent works of MoCo v3 and DINO which adopt DeiT as the backbone, but with much lighter tricks.
More importantly, the general-purpose Swin Transformer backbone enables us to also evaluate the learnt representations on downstream tasks such as object detection and semantic segmentation, in contrast to a few recent approaches built on ViT/DeiT which only report linear evaluation results on ImageNet-1K due to ViT/DeiT not tamed for these dense prediction tasks. We hope our results can facilitate more comprehensive evaluation of self-supervised learning methods designed for Transformer architectures. Our code and models are available at https://github.com/SwinTransformer/Transformer-SSL, which will be continually enriched.

[421]  arXiv:2105.04554 [pdf, other]
Title: Local approximate Gaussian process regression for data-driven constitutive laws: Development and comparison with neural networks
Comments: 22 pages, 15 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Hierarchical computational methods for multiscale mechanics such as the FE$^2$ and FE-FFT methods are generally accompanied by high computational costs. Data-driven approaches are able to speed the process up significantly by enabling to incorporate the effective micromechanical response in macroscale simulations without the need of performing additional computations at each Gauss point explicitly. Traditionally artificial neural networks (ANNs) have been the surrogate modeling technique of choice in the solid mechanics community. However they suffer from severe drawbacks due to their parametric nature and suboptimal training and inference properties for the investigated datasets in a three dimensional setting. These problems can be avoided using local approximate Gaussian process regression (laGPR). This method can allow the prediction of stress outputs at particular strain space locations by training local regression models based on Gaussian processes, using only a subset of the data for each local model, offering better and more reliable accuracy than ANNs. A modified Newton-Raphson approach is proposed to accommodate for the local nature of the laGPR approximation when solving the global structural problem in a FE setting. Hence, the presented work offers a complete and general framework enabling multiscale calculations combining a data-driven constitutive prediction using laGPR, and macroscopic calculations using an FE scheme that we test for finite-strain three-dimensional hyperelastic problems.

Cross-lists for Tue, 11 May 21

[422]  arXiv:1811.12759 (cross-list from math.OC) [pdf, other]
Title: A Decentralized Event-Based Approach for Robust Model Predictive Control
Comments: 18 pages, 3 figures
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In this paper, we propose an event-based sampling policy to implement a constraint-tightening, robust MPC method. The proposed policy enjoys a computationally tractable design and is applicable to perturbed, linear time-invariant systems with polytopic constraints. In particular, the triggering mechanism is suitable for plants with no centralized sensory node as the triggering mechanism can be evaluated locally at each individual sensor. From a geometrical viewpoint, the mechanism is a sequence of hyper-rectangles surrounding the optimal state trajectory such that robust recursive feasibility and robust stability are guaranteed. The design of the triggering mechanism is cast as a constrained parametric-in-set optimization problem with the volume of the set as the objective function. Re-parameterized in terms of the set vertices, we show that the problem admits a finite tractable convex program reformulation and a linear program relaxation. Several numerical examples are presented to demonstrate the effectiveness and limitations of the theoretical results.

[423]  arXiv:2008.10362 (cross-list from math.OC) [pdf, other]
Title: Fast Approximate Dynamic Programming for Input-Affine Dynamics
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

We propose two novel numerical schemes for approximate implementation of the Dynamic Programming (DP) operation concerned with finite-horizon optimal control of discrete-time, stochastic systems with input-affine dynamics. The proposed algorithms involve discretization of the state and input spaces, and are based on an alternative path that solves the dual problem corresponding to the DP operation. We provide error bounds for the proposed algorithms, along with a detailed analyses of their computational complexity. In particular, for a specific class of problems with separable data in the state and input variables, the proposed approach can reduce the typical time complexity of the DP operation from O(XU) to O(X+U) where X and U denote the size of the discrete state and input spaces, respectively. In a broader perspective, the key contribution here can be viewed as an algorithmic transformation of the minimization in DP operation to addition via discrete conjugation. This bridge enables us to utilize any complexity reduction on the discrete conjugation front within the proposed algorithms. In particular, motivated by the recent development of quantum algorithms for computing the discrete conjugate transform, we discuss the possibility of a quantum mechanical implementation of the proposed algorithms.

[424]  arXiv:2102.08880 (cross-list from math.OC) [pdf, other]
Title: Fast Approximate Dynamic Programming for Infinite-Horizon Continuous-State Markov Decision Processes
Comments: 17 pages, 1 figure. arXiv admin note: text overlap with arXiv:2008.10362
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In this article, we consider the infinite-horizon, discounted cost, optimal control of discrete-time systems with separable cost and constraint in the state and input variables. Starting from deterministic linear dynamics, we introduce a novel numerical algorithm for implementation of the value iteration (VI) algorithm in the conjugate domain, using the Linear-time Legendre Transform algorithm. Detailed analyses of the convergence, complexity, and error of the proposed algorithm are provided. In particular, with a discretization of size $X$ and $U$ for the state and input spaces, respectively, the proposed approach can reduce the time complexity of each iteration of the VI algorithm from $O(XU)$ to $O(X)$, by replacing the minimization operation in the primal domain with a simple addition in the conjugate domain. Also discussed are the direct extensions of the proposed algorithm for nonlinear dynamics and stochastic dynamics with additive noise.

[425]  arXiv:2104.14929 (cross-list from stat.ML) [pdf, other]
Title: On In-network learning. A Comparative Study with Federated and Split Learning
Comments: Submitted to the 2021 IEEE 22nd International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), special session on Machine learning at the Edge
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

In this paper, we consider a problem in which distributively extracted features are used for performing inference in wireless networks. We elaborate on our proposed architecture, which we herein refer to as "in-network learning", provide a suitable loss function and discuss its optimization using neural networks. We compare its performance with both Federated- and Split learning; and show that this architecture offers both better accuracy and bandwidth savings.

[426]  arXiv:2105.02180 (cross-list from math.ST) [pdf, other]
Title: A unifying tutorial on Approximate Message Passing
Comments: 99 pages, 2 figures
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (stat.ML)

Over the last decade or so, Approximate Message Passing (AMP) algorithms have become extremely popular in various structured high-dimensional statistical problems. The fact that the origins of these techniques can be traced back to notions of belief propagation in the statistical physics literature lends a certain mystique to the area for many statisticians. Our goal in this work is to present the main ideas of AMP from a statistical perspective, to illustrate the power and flexibility of the AMP framework. Along the way, we strengthen and unify many of the results in the existing literature.

[427]  arXiv:2105.03511 (cross-list from math.MG) [pdf, ps, other]
Title: Bounds for the sum of distances of spherical sets of small size
Comments: 18 pp
Subjects: Metric Geometry (math.MG); Information Theory (cs.IT); Combinatorics (math.CO)

We derive upper and lower bounds on the sum of distances of a spherical code of size $N$ in $n$ dimensions when $N\sim n^\alpha, 0<\alpha\le 2.$ The bounds are derived by specializing recent general, universal bounds on energy of spherical sets. We discuss asymptotic behavior of our bounds along with several examples of codes whose sum of distances closely follows the upper bound.

[428]  arXiv:2105.03538 (cross-list from math.AP) [pdf, other]
Title: Equivalent formulations of the oxygen depletion problem, other implicit free boundary value problems, and implications for numerical approximation
Comments: 30 pages, 4 figures
Subjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA)

The Oxygen Depletion problem is an implicit free boundary value problem. The dynamics allow topological changes in the free boundary. We show several mathematical formulations of this model from the literature and give a new formulation based on a gradient flow with constraint. All formulations are shown to be equivalent. We explore the possibilities for the numerical approximation of the problem that arise from the different formulations. We show a convergence result for an approximation based on the gradient flow with constraint formulation that applies to the general dynamics including topological changes. More general (vector, higher order) implicit free boundary value problems are discussed. Several open problems are described.

[429]  arXiv:2105.03542 (cross-list from eess.AS) [pdf, other]
Title: Zero-Shot Personalized Speech Enhancement through Speaker-Informed Model Selection
Comments: 5 pages, 3 figures, submitted to 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

This paper presents a novel zero-shot learning approach towards personalized speech enhancement through the use of a sparsely active ensemble model. Optimizing speech denoising systems towards a particular test-time speaker can improve performance and reduce run-time complexity. However, test-time model adaptation may be challenging if collecting data from the test-time speaker is not possible. To this end, we propose using an ensemble model wherein each specialist module denoises noisy utterances from a distinct partition of training set speakers. The gating module inexpensively estimates test-time speaker characteristics in the form of an embedding vector and selects the most appropriate specialist module for denoising the test signal. Grouping the training set speakers into non-overlapping semantically similar groups is non-trivial and ill-defined. To do this, we first train a Siamese network using noisy speech pairs to maximize or minimize the similarity of its output vectors depending on whether the utterances derive from the same speaker or not. Next, we perform k-means clustering on the latent space formed by the averaged embedding vectors per training set speaker. In this way, we designate speaker groups and train specialist modules optimized around partitions of the complete training set. Our experiments show that ensemble models made up of low-capacity specialists can outperform high-capacity generalist models with greater efficiency and improved adaptation towards unseen test-time speakers.

[430]  arXiv:2105.03544 (cross-list from eess.AS) [pdf, other]
Title: Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning with Knowledge Distillation
Authors: Sunwoo Kim, Minje Kim
Comments: 5 pages, 5 figures, under review
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

In realistic speech enhancement settings for end-user devices, we often encounter only a few speakers and noise types that tend to reoccur in the specific acoustic environment. We propose a novel personalized speech enhancement method to adapt a compact denoising model to the test-time specificity. Our goal in this test-time adaptation is to utilize no clean speech target of the test speaker, thus fulfilling the requirement for zero-shot learning. To complement the lack of clean utterance, we employ the knowledge distillation framework. Instead of the missing clean utterance target, we distill the more advanced denoising results from an overly large teacher model, and use it as the pseudo target to train the small student model. This zero-shot learning procedure circumvents the process of collecting users' clean speech, a process that users are reluctant to comply due to privacy concerns and technical difficulty of recording clean voice. Experiments on various test-time conditions show that the proposed personalization method achieves significant performance gains compared to larger baseline networks trained from a large speaker- and noise-agnostic datasets. In addition, since the compact personalized models can outperform larger general-purpose models, we claim that the proposed method performs model compression with no loss of denoising performance.

[431]  arXiv:2105.03556 (cross-list from math.CO) [pdf, other]
Title: Inside the Binary Reflected Gray Code: Flip-Swap Languages in 2-Gray Code Order
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

A flip-swap language is a set S of binary strings of length n such that $S \cup 0^n$ is closed under two operations (when applicable): (1) Flip the leftmost 1; and (2) Swap the leftmost 1 with the bit to its right. Flip-swap languages model many combinatorial objects including necklaces, Lyndon words, prefix normal words, left factors of k-ary Dyck words, and feasible solutions to 0-1 knapsack problems. We prove that any flip-swap language forms a cyclic 2-Gray code when listed in binary reflected Gray code (BRGC) order. Furthermore, a generic successor rule computes the next string when provided with a membership tester. The rule generates each string in the aforementioned flip-swap languages in O(n)-amortized per string, except for prefix normal words of length n which require O($n^{1.864}$)-amortized per string. Our work generalizes results on necklaces and Lyndon words by Vajnovski [Inf. Process. Lett. 106(3):96$-$99, 2008].

[432]  arXiv:2105.03568 (cross-list from eess.SP) [pdf, other]
Title: ChaRRNets: Channel Robust Representation Networks for RF Fingerprinting
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

We present complex-valued Convolutional Neural Networks (CNNs) for RF fingerprinting that go beyond translation invariance and appropriately account for the inductive bias with respect to multipath propagation channels, a phenomenon that is specific to the fields of wireless signal processing and communications. We focus on the problem of fingerprinting wireless IoT devices in-the-wild using Deep Learning (DL) techniques. Under these real-world conditions, the multipath environments represented in the train and test sets will be different. These differences are due to the physics governing the propagation of wireless signals, as well as the limitations of practical data collection campaigns. Our approach follows a group-theoretic framework, leverages prior work on DL on manifold-valued data, and extends this prior work to the wireless signal processing domain. We introduce the Lie group of transformations that a signal experiences under the multipath propagation model and define operations that are equivariant and invariant to the frequency response of a Finite Impulse Response (FIR) filter to build a ChaRRNet. We present results using synthetic and real-world datasets, and we benchmark against a strong baseline model, that show the efficacy of our approach. Our results provide evidence of the benefits of incorporating appropriate wireless domain biases into DL models. We hope to spur new work in the area of robust RF machine learning, as the 5G revolution increases demand for enhanced security mechanisms.

[433]  arXiv:2105.03583 (cross-list from eess.AS) [pdf]
Title: Domestic activities clustering from audio recordings using convolutional capsule autoencoder network
Comments: 5 pages, 2 figures, 5 tables, Accepted by IEEE ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Recent efforts have been made on domestic activities classification from audio recordings, especially the works submitted to the challenge of DCASE (Detection and Classification of Acoustic Scenes and Events) since 2018. In contrast, few studies were done on domestic activities clustering, which is a newly emerging problem. Domestic activities clustering from audio recordings aims at merging audio clips which belong to the same class of domestic activity into a single cluster. Domestic activities clustering is an effective way for unsupervised estimation of daily activities performed in home environment. In this study, we propose a method for domestic activities clustering using a convolutional capsule autoencoder network (CCAN). In the method, the deep embeddings are learned by the autoencoder in the CCAN, while the deep embeddings which belong to the same class of domestic activities are merged into a single cluster by a clustering layer in the CCAN. Evaluated on a public dataset adopted in DCASE-2018 Task 5, the results show that the proposed method outperforms state-of-the-art methods in terms of the metrics of clustering accuracy and normalized mutual information.

[434]  arXiv:2105.03584 (cross-list from stat.ML) [pdf, other]
Title: Adaptive Latent Space Tuning for Non-Stationary Distributions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Accelerator Physics (physics.acc-ph)

Powerful deep learning tools, such as convolutional neural networks (CNN), are able to learn the input-output relationships of large complicated systems directly from data. Encoder-decoder deep CNNs are able to extract features directly from images, mix them with scalar inputs within a general low-dimensional latent space, and then generate new complex 2D outputs which represent complex physical phenomenon. One important challenge faced by deep learning methods is large non-stationary systems whose characteristics change quickly with time for which re-training is not feasible. In this paper we present a method for adaptive tuning of the low-dimensional latent space of deep encoder-decoder style CNNs based on real-time feedback to quickly compensate for unknown and fast distribution shifts. We demonstrate our approach for predicting the properties of a time-varying charged particle beam in a particle accelerator whose components (accelerating electric fields and focusing magnetic fields) are also quickly changing with time.

[435]  arXiv:2105.03617 (cross-list from q-bio.BM) [pdf, ps, other]
Title: MEGADOCK-GUI: a GUI-based complete cross-docking tool for exploring protein-protein interactions
Comments: 9 pages, 6 figures
Subjects: Biomolecules (q-bio.BM); Distributed, Parallel, and Cluster Computing (cs.DC); Molecular Networks (q-bio.MN); Quantitative Methods (q-bio.QM)

Information on protein-protein interactions (PPIs) not only advances our understanding of molecular biology but also provides important clues for target selection in drug discovery and the design of PPI inhibitors. One of the techniques used for computational prediction of PPIs is protein-protein docking calculations, and a variety of software has been developed. However, a friendly interface for users who are not sufficiently familiar with the command line interface has not been developed so far. In this study, we have developed a graphical user interface, MEGADOCK-GUI, which enables users to easily predict PPIs and protein complex structures. In addition to the original 3-D molecular viewer and input file preparation functions, MEGADOCK-GUI is software that can automatically perform complete cross-docking of $M$ vs. $N$ proteins. With MEGADOCK-GUI, various applications related to the prediction of PPIs, such as ensemble docking that handles multiple conformations of proteins and screening of binding partner proteins that bind to specific proteins, can now be easily performed.

[436]  arXiv:2105.03625 (cross-list from q-fin.TR) [pdf, other]
Title: MCTG:Multi-frequency continuous-share trading algorithm with GARCH based on deep reinforcement learning
Subjects: Trading and Market Microstructure (q-fin.TR); Machine Learning (cs.LG)

Making profits in stock market is a challenging task for both professional institutional investors and individual traders. With the development combination of quantitative trading and reinforcement learning, more trading algorithms have achieved significant gains beyond the benchmark model Buy&Hold (B&H). There is a certain gap between these algorithms and the real trading decision making scenarios. On the one hand, they only consider trading signals while ignoring the number of transactions. On the other hand, the information level considered by these algorithms is not rich enough, which limits the performance of these algorithms. Thus, we propose an algorithm called the Multi-frequency Continuous-share Trading algorithm with GARCH (MCTG) to solve the problems above, which consists of parallel network layers and deep reinforcement learning. The former is composed of three parallel network layers, respectively dealing with different frequencies (five minute, one day, one week) data, and day level considers the volatilities of stocks. The latter with a continuous action space of the reinforcement learning algorithm is used to solve the problem of trading stock shares. Experiments in different industries of Chinese stock market show our method achieves more extra profit comparing with basic DRL methods and bench model.

[437]  arXiv:2105.03643 (cross-list from eess.AS) [pdf, other]
Title: Latency-Controlled Neural Architecture Search for Streaming Speech Recognition
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Recently, neural architecture search (NAS) has attracted much attention and has been explored for automatic speech recognition (ASR). Our prior work has shown promising results compared with hand-designed neural networks. In this work, we focus on streaming ASR scenarios and propose the latency-controlled NAS for acoustic modeling. First, based on the vanilla neural architecture, normal cells are altered to be causal cells, in order to control the total latency of the neural network. Second, a revised operation space with a smaller receptive field is proposed to generate the final architecture with low latency. Extensive experiments show that: 1) Based on the proposed neural architecture, the neural networks with a medium latency of 550ms (millisecond) and a low latency of 190ms can be learned in the vanilla and revised operation space respectively. 2) For the low latency setting, the evaluation network can achieve more than 19\% (average on the four test sets) relative improvements compared with the hybrid CLDNN baseline, on a 10k-hour large-scale dataset. Additional 11\% relative improvements can be achieved if the latency of the neural network is relaxed to the medium latency setting.

[438]  arXiv:2105.03660 (cross-list from eess.SP) [pdf, other]
Title: Deep learning of nanopore sensing signals using a bi-path network
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Biological Physics (physics.bio-ph)

Temporary changes in electrical resistance of a nanopore sensor caused by translocating target analytes are recorded as a sequence of pulses on current traces. Prevalent algorithms for feature extraction in pulse-like signals lack objectivity because empirical amplitude thresholds are user-defined to single out the pulses from the noisy background. Here, we use deep learning for feature extraction based on a bi-path network (B-Net). After training, the B-Net acquires the prototypical pulses and the ability of both pulse recognition and feature extraction without a priori assigned parameters. The B-Net performance is evaluated on generated datasets and further applied to experimental data of DNA and protein translocation. The B-Net results show remarkably small relative errors and stable trends. The B-Net is further shown capable of processing data with a signal-to-noise ratio equal to one, an impossibility for threshold-based algorithms. The developed B-Net is generic for pulse-like signals beyond pulsed nanopore currents.

[439]  arXiv:2105.03678 (cross-list from eess.SP) [pdf, other]
Title: Nearly Minimax-Optimal Rates for Noisy Sparse Phase Retrieval via Early-Stopped Mirror Descent
Comments: arXiv admin note: text overlap with arXiv:2010.10168
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper studies early-stopped mirror descent applied to noisy sparse phase retrieval, which is the problem of recovering a $k$-sparse signal $\mathbf{x}^\star\in\mathbb{R}^n$ from a set of quadratic Gaussian measurements corrupted by sub-exponential noise. We consider the (non-convex) unregularized empirical risk minimization problem and show that early-stopped mirror descent, when equipped with the hyperbolic entropy mirror map and proper initialization, achieves a nearly minimax-optimal rate of convergence, provided the sample size is at least of order $k^2$ (modulo logarithmic term) and the minimum (in modulus) non-zero entry of the signal is on the order of $\|\mathbf{x}^\star\|_2/\sqrt{k}$. Our theory leads to a simple algorithm that does not rely on explicit regularization or thresholding steps to promote sparsity. More generally, our results establish a connection between mirror descent and sparsity in the non-convex problem of noisy sparse phase retrieval, adding to the literature on early stopping that has mostly focused on non-sparse, Euclidean, and convex settings via gradient descent. Our proof combines a potential-based analysis of mirror descent with a quantitative control on a variational coherence property that we establish along the path of mirror descent, up to a prescribed stopping time.

[440]  arXiv:2105.03679 (cross-list from eess.IV) [pdf, other]
Title: EZCrop: Energy-Zoned Channels for Robust Output Pruning
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)

Recent results have revealed an interesting observation in a trained convolutional neural network (CNN), namely, the rank of a feature map channel matrix remains surprisingly constant despite the input images. This has led to an effective rank-based channel pruning algorithm, yet the constant rank phenomenon remains mysterious and unexplained. This work aims at demystifying and interpreting such rank behavior from a frequency-domain perspective, which as a bonus suggests an extremely efficient Fast Fourier Transform (FFT)-based metric for measuring channel importance without explicitly computing its rank. We achieve remarkable CNN channel pruning based on this analytically sound and computationally efficient metric and adopt it for repetitive pruning to demonstrate robustness via our scheme named Energy-Zoned Channels for Robust Output Pruning (EZCrop), which shows consistently better results than other state-of-the-art channel pruning methods.

[441]  arXiv:2105.03684 (cross-list from quant-ph) [pdf, other]
Title: Quantum Machine Learning For Classical Data
Authors: Leonard Wossnig
Comments: PhD thesis
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

In this dissertation, we study the intersection of quantum computing and supervised machine learning algorithms, which means that we investigate quantum algorithms for supervised machine learning that operate on classical data. This area of research falls under the umbrella of quantum machine learning, a research area of computer science which has recently received wide attention. In particular, we investigate to what extent quantum computers can be used to accelerate supervised machine learning algorithms. The aim of this is to develop a clear understanding of the promises and limitations of the current state of the art of quantum algorithms for supervised machine learning, but also to define directions for future research in this exciting field. We start by looking at supervised quantum machine learning (QML) algorithms through the lens of statistical learning theory. In this framework, we derive novel bounds on the computational complexities of a large set of supervised QML algorithms under the requirement of optimal learning rates. Next, we give a new bound for Hamiltonian simulation of dense Hamiltonians, a major subroutine of most known supervised QML algorithms, and then derive a classical algorithm with nearly the same complexity. We then draw the parallels to recent "quantum-inspired" results, and will explain the implications of these results for quantum machine learning applications. Looking for areas which might bear larger advantages for QML algorithms, we finally propose a novel algorithm for Quantum Boltzmann machines, and argue that quantum algorithms for quantum data are one of the most promising applications for QML with potentially exponential advantage over classical approaches.

[442]  arXiv:2105.03697 (cross-list from quant-ph) [pdf, ps, other]
Title: Quantum Proofs of Proximity
Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC)

We initiate the systematic study of QMA algorithms in the setting of property testing, to which we refer as QMA proofs of proximity (QMAPs). These are quantum query algorithms that receive explicit access to a sublinear-size untrusted proof and are required to accept inputs having a property $\Pi$ and reject inputs that are $\varepsilon$-far from $\Pi$, while only probing a minuscule portion of their input. Our algorithmic results include a general-purpose theorem that enables quantum speedups for testing an expressive class of properties, namely, those that are succinctly decomposable. Furthermore, we show quantum speedups for properties that lie outside of this family, such as graph bipartitneness. We also investigate the complexity landscape of this model, showing that QMAPs can be exponentially stronger than both classical proofs of proximity and quantum testers. To this end, we extend the methodology of Blais, Brody and Matulef (Computational Complexity, 2012) to prove quantum property testing lower bounds via reductions from communication complexity, thereby resolving a problem raised by Montanaro and de Wolf (Theory of Computing, 2016).

[443]  arXiv:2105.03752 (cross-list from physics.flu-dyn) [pdf]
Title: Improving Deep Learning Performance for Predicting Large-Scale Porous-Media Flow through Feature Coarsening
Comments: 12 pages, 7 figures
Subjects: Fluid Dynamics (physics.flu-dyn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)

Physics-based simulation for fluid flow in porous media is a computational technology to predict the temporal-spatial evolution of state variables (e.g. pressure) in porous media, and usually requires high computational expense due to its nonlinearity and the scale of the study domain. This letter describes a deep learning (DL) workflow to predict the pressure evolution as fluid flows in large-scale 3D heterogeneous porous media. In particular, we apply feature coarsening technique to extract the most representative information and perform the training and prediction of DL at the coarse scale, and further recover the resolution at the fine scale by 2D piecewise cubic interpolation. We validate the DL approach that is trained from physics-based simulation data to predict pressure field in a field-scale 3D geologic CO_2 storage reservoir. We evaluate the impact of feature coarsening on DL performance, and observe that the feature coarsening can not only decrease training time by >74% and reduce memory consumption by >75%, but also maintains temporal error <1.5%. Besides, the DL workflow provides predictive efficiency with ~1400 times speedup compared to physics-based simulation.

[444]  arXiv:2105.03774 (cross-list from eess.SP) [pdf, ps, other]
Title: Study of List-Based OMP and an Enhanced Model for Direction Finding with Non-Uniform Arrays
Comments: 6 figures, 8 pages
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

This paper proposes an enhanced coarray transformation model (EDCTM) and a mixed greedy maximum likelihood algorithm called List-Based Maximum Likelihood Orthogonal Matching Pursuit (LBML-OMP) for direction-of-arrival estimation with non-uniform linear arrays (NLAs). The proposed EDCTM approach obtains improved estimates when Khatri-Rao product-based models are used to generate difference coarrays under the assumption of uncorrelated sources. In the proposed LBML-OMP technique, for each iteration a set of candidates is generated based on the correlation-maximization between the dictionary and the residue vector. LBML-OMP then chooses the best candidate based on a reduced-complexity asymptotic maximum likelihood decision rule. Simulations show the improved results of EDCTM over existing approaches and that LBML-OMP outperforms existing sparse recovery algorithms as well as Spatial Smoothing Multiple Signal Classification with NLAs.

[445]  arXiv:2105.03847 (cross-list from eess.IV) [pdf]
Title: Automatic segmentation of vertebral features on ultrasound spine images using Stacked Hourglass Network
Comments: 9 pages,5 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Objective: The spinous process angle (SPA) is one of the essential parameters to denote three-dimensional (3-D) deformity of spine. We propose an automatic segmentation method based on Stacked Hourglass Network (SHN) to detect the spinous processes (SP) on ultrasound (US) spine images and to measure the SPAs of clinical scoliotic subjects. Methods: The network was trained to detect vertebral SP and laminae as five landmarks on 1200 ultrasound transverse images and validated on 100 images. All the processed transverse images with highlighted SP and laminae were reconstructed into a 3D image volume, and the SPAs were measured on the projected coronal images. The trained network was tested on 400 images by calculating the percentage of correct keypoints (PCK); and the SPA measurements were evaluated on 50 scoliotic subjects by comparing the results from US images and radiographs. Results: The trained network achieved a high average PCK (86.8%) on the test datasets, particularly the PCK of SP detection was 90.3%. The SPAs measured from US and radiographic methods showed good correlation (r>0.85), and the mean absolute differences (MAD) between two modalities were 3.3{\deg}, which was less than the clinical acceptance error (5{\deg}). Conclusion: The vertebral features can be accurately segmented on US spine images using SHN, and the measurement results of SPA from US data was comparable to the gold standard from radiography.

[446]  arXiv:2105.03854 (cross-list from physics.flu-dyn) [pdf]
Title: Surrogate Modeling of Fluid Dynamics with a Multigrid Inspired Neural Network Architecture
Comments: 22 pages, 15 figures
Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)

Algebraic or geometric multigrid methods are commonly used in numerical solvers as they are a multi-resolution method able to handle problems with multiple scales. In this work, we propose a modification to the commonly-used U-Net neural network architecture that is inspired by the principles of multigrid methods, referred to here as U-Net-MG. We then demonstrate that this proposed U-Net-MG architecture can successfully reduce the test prediction errors relative to the conventional U-Net architecture when modeling a set of fluid dynamic problems. In total, we demonstrate an improvement in the prediction of velocity and pressure fields for the canonical fluid dynamics cases of flow past a stationary cylinder, flow past 2 cylinders in out-of-phase motion, and flow past an oscillating airfoil in both the propulsion and energy harvesting modes. In general, while both the U-Net and U-Net-MG models can model the systems well with test RMSEs of less than 1%, the use of the U-Net-MG architecture can further reduce RMSEs by between 20% and 70%.

[447]  arXiv:2105.03863 (cross-list from stat.ML) [pdf, ps, other]
Title: Non-asymptotic Performances of Robust Markov Decision Processes
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In this paper, we study the non-asymptotic performance of optimal policy on robust value function with true transition dynamics. The optimal robust policy is solved from a generative model or offline dataset without access to true transition dynamics. In particular, we consider three different uncertainty sets including the $L_1$, $\chi^2$ and KL balls in both $(s,a)$-rectangular and $s$-rectangular assumptions. Our results show that when we assume $(s,a)$-rectangular on uncertainty sets, the sample complexity is about $\widetilde{O}\left(\frac{|\mathcal{S}|^2|\mathcal{A}|}{\varepsilon^2\rho^2(1-\gamma)^4}\right)$ in the generative model setting and $\widetilde{O}\left(\frac{|\mathcal{S}|}{\nu_{\min}\varepsilon^2\rho^2(1-\gamma)^4}\right)$ in the offline dataset setting. While prior works on non-asymptotic performances are restricted with the KL ball and $(s,a)$-rectangular assumption, we also extend our results to a more general $s$-rectangular assumption, which leads to a larger sample complexity than the $(s,a)$-rectangular assumption.

[448]  arXiv:2105.03905 (cross-list from eess.SP) [pdf, other]
Title: Security Concerns on Machine Learning Solutions for 6G Networks in mmWave Beam Prediction
Comments: 13 Pages, under review. arXiv admin note: substantial text overlap with arXiv:2103.07268
Subjects: Signal Processing (eess.SP); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

6G -- sixth generation -- is the latest cellular technology currently under development for wireless communication systems. In recent years, machine learning algorithms have been applied widely in various fields, such as healthcare, transportation, energy, autonomous car, and many more. Those algorithms have been also using in communication technologies to improve the system performance in terms of frequency spectrum usage, latency, and security. With the rapid developments of machine learning techniques, especially deep learning, it is critical to take the security concern into account when applying the algorithms. While machine learning algorithms offer significant advantages for 6G networks, security concerns on Artificial Intelligent (AI) models is typically ignored by the scientific community so far. However, security is also a vital part of the AI algorithms, this is because the AI model itself can be poisoned by attackers. This paper proposes a mitigation method for adversarial attacks against proposed 6G machine learning models for the millimeter-wave (mmWave) beam prediction using adversarial learning. The main idea behind adversarial attacks against machine learning models is to produce faulty results by manipulating trained deep learning models for 6G applications for mmWave beam prediction. We also present the adversarial learning mitigation method's performance for 6G security in mmWave beam prediction application with fast gradient sign method attack. The mean square errors (MSE) of the defended model under attack are very close to the undefended model without attack.

[449]  arXiv:2105.03924 (cross-list from math.OC) [pdf, other]
Title: Computationally Efficient Dynamic Traffic Optimization Of Railway Systems
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In this paper we investigate real-time, dynamic traffic optimization in railway systems. In order to enable practical solution times, we operate the optimizer in a receding horizon fashion and with optimization horizons that are shorter than the full path to destinations, using a model predictive control (MPC) approach. We present new procedures to establish safe prediction horizons, providing formal guarantees that the system is operated in a way that satisfies hard safety constraints despite the fact that not all future train interactions are taken into account, by characterizing the minimal required optimization horizons. We also show that any feasible solution to our proposed models is sufficient to maintain a safe, automated operation of the railway system, providing an upper bound on the computations strictly required. Additionally, we show that these minimal optimization horizons also characterize an upper bound on computations required to construct a feasible solution for any arbitrary optimization horizon, paving the way for anytime algorithms. Finally, our results enable systematic solution reuse, when previous schedules are available. We test our approach on a detailed simulation environment of a real-world railway system used for freight transport.

[450]  arXiv:2105.03939 (cross-list from eess.IV) [pdf, other]
Title: Lightweight Image Super-Resolution with Hierarchical and Differentiable Neural Architecture Search
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Single Image Super-Resolution (SISR) tasks have achieved significant performance with deep neural networks. However, the large number of parameters in CNN-based methods for SISR tasks require heavy computations. Although several efficient SISR models have been recently proposed, most are handcrafted and thus lack flexibility. In this work, we propose a novel differentiable Neural Architecture Search (NAS) approach on both the cell-level and network-level to search for lightweight SISR models. Specifically, the cell-level search space is designed based on an information distillation mechanism, focusing on the combinations of lightweight operations and aiming to build a more lightweight and accurate SR structure. The network-level search space is designed to consider the feature connections among the cells and aims to find which information flow benefits the cell most to boost the performance. Unlike the existing Reinforcement Learning (RL) or Evolutionary Algorithm (EA) based NAS methods for SISR tasks, our search pipeline is fully differentiable, and the lightweight SISR models can be efficiently searched on both the cell-level and network-level jointly on a single GPU. Experiments show that our methods can achieve state-of-the-art performance on the benchmark datasets in terms of PSNR, SSIM, and model complexity with merely 68G Multi-Adds for $\times 2$ and 18G Multi-Adds for $\times 4$ SR tasks. Code will be available at \url{https://github.com/DawnHH/DLSR-PyTorch}.

[451]  arXiv:2105.03988 (cross-list from physics.comp-ph) [pdf, other]
Title: Probabilistic forecast of multiphase transport under viscous and buoyancy forces in heterogeneous porous media
Subjects: Computational Physics (physics.comp-ph); Analysis of PDEs (math.AP); Numerical Analysis (math.NA); Probability (math.PR)

In this study, we develop a probabilistic approach to map the parametric uncertainty to the output state uncertainty in first-order hyperbolic conservation laws. We analyze this problem for nonlinear immiscible two-phase transport in heterogeneous porous media in the presence of a stochastic velocity field. The uncertainty in the velocity field can arise from the incomplete description of either porosity field, injection flux, or both. The uncertainty in the total-velocity field leads to the spatiotemporal uncertainty in the saturation field. Given information about the spatial/temporal statistics of the correlated heterogeneity, we leverage method of distributions to derive deterministic equations that govern the evolution of single-point CDF of saturation. Unlike Buckley Leverett equation, the equation for the raw CDF function is linear in space and time. Hereby, we give routes to circumventing the computational cost of Monte Carlo scheme while obtaining the full statistical description of saturation. We conduct a set of numerical experiments and compare statistics of saturation computed with the method of distributions, against those obtained using the statistical moment equations approach and kernel density estimation post-processing of high-resolution Monte Carlo simulations. This comparison demonstrates that the CDF equations remain accurate over a wide range of statistical properties, i.e. standard deviation and correlation length of the underlying random fields, while the corresponding low-order statistical moment equations significantly deviate from Monte Carlo results, unless for very small values of standard deviation and correlation length.

[452]  arXiv:2105.03991 (cross-list from math.CV) [pdf, ps, other]
Title: Holomorphic feedforward networks
Comments: 13 pages, version to appear in PAMQ
Subjects: Complex Variables (math.CV); High Energy Physics - Theory (hep-th); Numerical Analysis (math.NA)

A very popular model in machine learning is the feedforward neural network (FFN). The FFN can approximate general functions and mitigate the curse of dimensionality. Here we introduce FFNs which represent sections of holomorphic line bundles on complex manifolds, and ask some questions about their approximating power. We also explain formal similarities between the standard approach to supervised learning and the problem of finding numerical Ricci flat K\"ahler metrics, which allow carrying some ideas between the two problems.

[453]  arXiv:2105.03995 (cross-list from eess.IV) [pdf, other]
Title: Acute Lymphoblastic Leukemia Detection from Microscopic Images Using Weighted Ensemble of Convolutional Neural Networks
Comments: 31 pages, 9 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Acute Lymphoblastic Leukemia (ALL) is a blood cell cancer characterized by numerous immature lymphocytes. Even though automation in ALL prognosis is an essential aspect of cancer diagnosis, it is challenging due to the morphological correlation between malignant and normal cells. The traditional ALL classification strategy demands experienced pathologists to carefully read the cell images, which is arduous, time-consuming, and often suffers inter-observer variations. This article has automated the ALL detection task from microscopic cell images, employing deep Convolutional Neural Networks (CNNs). We explore the weighted ensemble of different deep CNNs to recommend a better ALL cell classifier. The weights for the ensemble candidate models are estimated from their corresponding metrics, such as accuracy, F1-score, AUC, and kappa values. Various data augmentations and pre-processing are incorporated for achieving a better generalization of the network. We utilize the publicly available C-NMC-2019 ALL dataset to conduct all the comprehensive experiments. Our proposed weighted ensemble model, using the kappa values of the ensemble candidates as their weights, has outputted a weighted F1-score of 88.6 %, a balanced accuracy of 86.2 %, and an AUC of 0.941 in the preliminary test set. The qualitative results displaying the gradient class activation maps confirm that the introduced model has a concentrated learned region. In contrast, the ensemble candidate models, such as Xception, VGG-16, DenseNet-121, MobileNet, and InceptionResNet-V2, separately produce coarse and scatter learned areas for most example cases. Since the proposed kappa value-based weighted ensemble yields a better result for the aimed task in this article, it can experiment in other domains of medical diagnostic applications.

[454]  arXiv:2105.04001 (cross-list from stat.ML) [pdf, other]
Title: Bayesian Kernelised Test of (In)dependence with Mixed-type Variables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

A fundamental task in AI is to assess (in)dependence between mixed-type variables (text, image, sound). We propose a Bayesian kernelised correlation test of (in)dependence using a Dirichlet process model. The new measure of (in)dependence allows us to answer some fundamental questions: Based on data, are (mixed-type) variables independent? How likely is dependence/independence to hold? How high is the probability that two mixed-type variables are more than just weakly dependent? We theoretically show the properties of the approach, as well as algorithms for fast computation with it. We empirically demonstrate the effectiveness of the proposed method by analysing its performance and by comparing it with other frequentist and Bayesian approaches on a range of datasets and tasks with mixed-type variables.

[455]  arXiv:2105.04014 (cross-list from eess.IV) [pdf, other]
Title: DiagSet: a dataset for prostate cancer histopathological image classification
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cancer diseases constitute one of the most significant societal challenges. In this paper we introduce a novel histopathological dataset for prostate cancer detection. The proposed dataset, consisting of over 2.6 million tissue patches extracted from 430 fully annotated scans, 4675 scans with assigned binary diagnosis, and 46 scans with diagnosis given independently by a group of histopathologists, can be found at https://ai-econsilio.diag.pl. Furthermore, we propose a machine learning framework for detection of cancerous tissue regions and prediction of scan-level diagnosis, utilizing thresholding and statistical analysis to abstain from the decision in uncertain cases. During the experimental evaluation we identify several factors negatively affecting the performance of considered models, such as presence of label noise, data imbalance, and quantity of data, that can serve as a basis for further research. The proposed approach, composed of ensembles of deep neural networks operating on the histopathological scans at different scales, achieves 94.6% accuracy in patch-level recognition, and is compared in a scan-level diagnosis with 9 human histopathologists.

[456]  arXiv:2105.04033 (cross-list from quant-ph) [pdf, other]
Title: Key Assistance, Key Agreement, and Layered Secrecy for Bosonic Broadcast Channels
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

Secret-sharing building blocks based on quantum broadcast communication are studied. The confidential capacity region of the pure-loss bosonic broadcast channel is determined, both with and without key assistance, and an achievable region is established for the lossy bosonic broadcast channel. If the main receiver has a transmissivity of \eta<1/2, then confidentiality solely relies on the key-assisted encryption of the one-time pad. We also address conference key agreement for the distillation of two keys, a public key and a secret key. A regularized formula is derived for the key-agreement capacity region in finite dimensions. In the bosonic case, the key-agreement region is included within the capacity region of the corresponding broadcast channel with confidential messages. We then consider a network with layered secrecy, where three users with different security ranks communicate over the same broadcast network. We derive an achievable layered-secrecy region for a pure-loss bosonic channel that is formed by the concatenation of two beam splitters.

[457]  arXiv:2105.04044 (cross-list from quant-ph) [pdf, other]
Title: Practical Parallel Self-testing of Bell States via Magic Rectangles
Comments: 26 pages, 4 figures; comments are very welcome!
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)

Self-testing is a method to verify that one has a particular quantum state from purely classical statistics. For practical applications, such as device-independent delegated verifiable quantum computation, it is crucial that one self-tests multiple Bell states in parallel while keeping the quantum capabilities required of one side to a minimum. In this work we use the $3 \times n$ magic rectangle games (generalisations of the magic square game) to obtain a self-test for $n$ Bell states where the one side needs only to measure single-qubit Pauli observables. The protocol requires small input size (constant for Alice and $O(\log n)$ bits for Bob) and is robust with robustness $O(n^{5/2} \sqrt{\varepsilon})$, where $\varepsilon$ is the closeness of the observed correlations to the ideal. To achieve the desired self-test we introduce a one-side-local quantum strategy for the magic square game that wins with certainty, generalise this strategy to the family of $3 \times n$ magic rectangle games, and supplement these nonlocal games with extra check rounds (of single and pairs of observables).

[458]  arXiv:2105.04046 (cross-list from stat.ML) [pdf, other]
Title: A likelihood approach to nonparametric estimation of a singular distribution using deep generative models
Comments: 33 pages, 12 figures, 1 table
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure such as a low-dimensional manifold is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly.

[459]  arXiv:2105.04059 (cross-list from quant-ph) [pdf, ps, other]
Title: Towards a functorial description of quantum relative entropy
Comments: 8 pages, submission to GSI'21 (post-print)
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Category Theory (math.CT)

A Bayesian functorial characterization of the classical relative entropy (KL divergence) of finite probabilities was recently obtained by Baez and Fritz. This was then generalized to standard Borel spaces by Gagn\'e and Panangaden. Here, we provide preliminary calculations suggesting that the finite-dimensional quantum (Umegaki) relative entropy might be characterized in a similar way. Namely, we explicitly prove that it defines an affine functor in the special case where the relative entropy is finite. A recent non-commutative disintegration theorem provides a key ingredient in this proof.

[460]  arXiv:2105.04077 (cross-list from eess.SP) [pdf, other]
Title: Dynamic Multichannel Access via Multi-agent Reinforcement Learning: Throughput and Fairness Guarantees
Comments: 20 pages, 12 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

We consider a multichannel random access system in which each user accesses a single channel at each time slot to communicate with an access point (AP). Users arrive to the system at random and be activated for a certain period of time slots and then disappear from the system. Under such dynamic network environment, we propose a distributed multichannel access protocol based on multi-agent reinforcement learning (RL) to improve both throughput and fairness between active users. Unlike the previous approaches adjusting channel access probabilities at each time slot, the proposed RL algorithm deterministically selects a set of channel access policies for several consecutive time slots. To effectively reduce the complexity of the proposed RL algorithm, we adopt a branching dueling Q-network architecture and propose an efficient training methodology for producing proper Q-values over time-varying user sets. We perform extensive simulations on realistic traffic environments and demonstrate that the proposed online learning improves both throughput and fairness compared to the conventional RL approaches and centralized scheduling policies.

[461]  arXiv:2105.04083 (cross-list from eess.SP) [pdf, other]
Title: The Behavior of Internet Traffic for Internet Services during COVID-19 Pandemic Scenario
Comments: 4 pages, 2 figures, Submitted to XXXIX Simp\'osio Brasileiro de Telecomunica\c{c}\~oes e Processamento de Sinais, SBrT 2021, Fortaleza, CE, Brasil
Subjects: Signal Processing (eess.SP); Networking and Internet Architecture (cs.NI)

Since the end of 2019, the SARS-CoV-2 virus known as COVID-19 has spread rapidly around the world, forcing many governments to impose restrictive blocking or lockdown to combat the pandemic. With locomotion restriction of people in almost of countries of the world, workers and students needed to keep their activities at home. As a result, people's behavior, habits, and the way they started using the Internet changed significantly. Like professionals of offices, the younger played an important role in this behavior, especially in the type of resources used by them. As result, the characterization and traffic of communication networks were affected in some way. In this perspective article, we join from many available studies about the COVID-19 effect at networks and investigate the effects on the Internet traffic of using services such as video streaming, video conferencing, and gaming during 2020's months of the pandemic.

[462]  arXiv:2105.04087 (cross-list from stat.ML) [pdf, other]
Title: Latency Analysis of Consortium Blockchained Federated Learning
Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

A decentralized federated learning architecture is proposed to apply to the Businesses-to-Businesses scenarios by introducing the consortium blockchain in this paper. We introduce a model verification mechanism to ensure the quality of local models trained by participators. To analyze the latency of the system, a latency model is constructed by considering the work flow of the architecture. Finally the experiment results show that our latency model does well in quantifying the actual delays.

[463]  arXiv:2105.04106 (cross-list from eess.IV) [pdf, other]
Title: Validation of image systems simulation technology using a Cornell Box
Subjects: Image and Video Processing (eess.IV); Graphics (cs.GR)

We describe and experimentally validate an end-to-end simulation of a digital camera. The simulation models the spectral radiance of 3D-scenes, formation of the spectral irradiance by multi-element optics, and conversion of the irradiance to digital values by the image sensor. We quantify the accuracy of the simulation by comparing real and simulated images of a precisely constructed, three-dimensional high dynamic range test scene. Validated end-to-end software simulation of a digital camera can accelerate innovation by reducing many of the time-consuming and expensive steps in designing, building and evaluating image systems.

[464]  arXiv:2105.04130 (cross-list from cond-mat.stat-mech) [pdf, other]
Title: Boltzmann machines as two-dimensional tensor networks
Comments: 12 pages, 11 figures
Subjects: Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Quantum Physics (quant-ph); Machine Learning (stat.ML)

Restricted Boltzmann machines (RBM) and deep Boltzmann machines (DBM) are important models in machine learning, and recently found numerous applications in quantum many-body physics. We show that there are fundamental connections between them and tensor networks. In particular, we demonstrate that any RBM and DBM can be exactly represented as a two-dimensional tensor network. This representation gives an understanding of the expressive power of RBM and DBM using entanglement structures of the tensor networks, also provides an efficient tensor network contraction algorithm for the computing partition function of RBM and DBM. Using numerical experiments, we demonstrate that the proposed algorithm is much more accurate than the state-of-the-art machine learning methods in estimating the partition function of restricted Boltzmann machines and deep Boltzmann machines, and have potential applications in training deep Boltzmann machines for general machine learning tasks.

[465]  arXiv:2105.04137 (cross-list from math.CO) [pdf, other]
Title: On the inversion number of oriented graphs
Authors: Jørgen Bang-Jensen (1), Jonas Costa Ferreira da Silva (2), Frédéric Havet (3) ((1) Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark, (2) Department of Mathematics, Universidade Federal do Ceará, Fortaleza, Brazil, (3) Université Côte d'Azur, CNRS, Inria, Sophia Antipolis, France)
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

Let $D$ be an oriented graph. The inversion of a set $X$ of vertices in $D$ consists in reversing the direction of all arcs with both ends in $X$. The inversion number of $D$, denoted by ${\rm inv}(D)$, is the minimum number of inversions needed to make $D$ acyclic. Denoting by $\tau(D)$, $\tau' (D)$, and $\nu(D)$ the cycle transversal number, the cycle arc-transversal number and the cycle packing number of $D$ respectively, one shows that ${\rm inv}(D) \leq \tau' (D)$, ${\rm inv}(D) \leq 2\tau(D)$ and there exists a function $g$ such that ${\rm inv}(D)\leq g(\nu(D))$. We conjecture that for any two oriented graphs $L$ and $R$, ${\rm inv}(L\rightarrow R) ={\rm inv}(L) +{\rm inv}(R)$ where $L\rightarrow R$ is the dijoin of $L$ and $R$. This would imply that the first two inequalities are tight. We prove this conjecture when ${\rm inv}(L)\leq 1$ and ${\rm inv}(R)\leq 2$ and when ${\rm inv}(L) ={\rm inv}(R)=2$ and $L$ and $R$ are strongly connected. We also show that the function $g$ of the third inequality satisfies $g(1)\leq 4$.
We then consider the complexity of deciding whether ${\rm inv}(D)\leq k$ for a given oriented graph $D$. We show that it is NP-complete for $k=1$, which together with the above conjecture would imply that it is NP-complete for every $k$. This contrasts with a result of Belkhechine et al. which states that deciding whether ${\rm inv}(T)\leq k$ for a given tournament $T$ is polynomial-time solvable.

[466]  arXiv:2105.04196 (cross-list from eess.SP) [pdf, other]
Title: AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via Multi-Agent Multi-Task Reinforcement Learning
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system. Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers while ensuring timely delivery of safety-critical messages to the Road-Side Unit (RSU). Due to the challenges of dynamic channel conditions, centralized resource management schemes that require global information are inefficient and lead to large signaling overheads. Hence, we exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy. Existing MARL algorithms consider a holistic reward function for the group's collective success, which often ends up with unsatisfactory results and cannot guarantee an optimal policy for each agent. Consequently, motivated by the existing literature in RL, we propose a novel MARL framework that trains two critics with the following goals: A global critic which estimates the global expected reward and motivates the agents toward a cooperating behavior and an exclusive local critic for each agent that estimates the local individual reward. Furthermore, based on the tasks each agent has to accomplish, the individual reward of each agent is decomposed into multiple sub-reward functions where task-wise value functions are learned separately. Numerical results indicate our proposed algorithm's effectiveness compared with the conventional RL methods applied in this area.

[467]  arXiv:2105.04207 (cross-list from eess.SP) [pdf, other]
Title: Age of Information Aware VNF Scheduling in Industrial IoT Using Deep Reinforcement Learning
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

In delay-sensitive industrial internet of things (IIoT) applications, the age of information (AoI) is employed to characterize the freshness of information. Meanwhile, the emerging network function virtualization provides flexibility and agility for service providers to deliver a given network service using a sequence of virtual network functions (VNFs). However, suitable VNF placement and scheduling in these schemes is NP-hard and finding a globally optimal solution by traditional approaches is complex. Recently, deep reinforcement learning (DRL) has appeared as a viable way to solve such problems. In this paper, we first utilize single agent low-complex compound action actor-critic RL to cover both discrete and continuous actions and jointly minimize VNF cost and AoI in terms of network resources under end-to end Quality of Service constraints. To surmount the single-agent capacity limitation for learning, we then extend our solution to a multi-agent DRL scheme in which agents collaborate with each other. Simulation results demonstrate that single-agent schemes significantly outperform the greedy algorithm in terms of average network cost and AoI. Moreover, multi-agent solution decreases the average cost by dividing the tasks between the agents. However, it needs more iterations to be learned due to the requirement on the agents collaboration.

[468]  arXiv:2105.04211 (cross-list from stat.ML) [pdf, other]
Title: SigGPDE: Scali