We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 470 entries: 1-470 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 16 Sep 21

[1]  arXiv:2109.06873 [pdf, other]
Title: Improving Robustness and Efficiency in Active Learning with Contrastive Loss
Comments: arXiv admin note: substantial text overlap with arXiv:2109.06321
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper introduces supervised contrastive active learning (SCAL) by leveraging the contrastive loss for active learning in a supervised setting. We propose efficient query strategies in active learning to select unbiased and informative data samples of diverse feature representations. We demonstrate our proposed method reduces sampling bias, achieves state-of-the-art accuracy and model calibration in an active learning setup with the query computation 11x faster than CoreSet and 26x faster than Bayesian active learning by disagreement. Our method yields well-calibrated models even with imbalanced datasets. We also evaluate robustness to dataset shift and out-of-distribution in active learning setup and demonstrate our proposed SCAL method outperforms high performing compute-intensive methods by a bigger margin (average 8.9% higher AUROC for out-of-distribution detection and average 7.2% lower ECE under dataset shift).

[2]  arXiv:2109.06874 [pdf, other]
Title: Agile, Antifragile, Artificial-Intelligence-Enabled, Command and Control
Authors: Jacob Simpson (1), Rudolph Oosthuizen (2), Sondoss El Sawah (1), Hussein Abbass (1) ((1) University of New South Wales Canberra, (2) University of Pretoria)
Comments: 12 pages, 7 figures, included in the 26th International Command and Control Research and Technology Symposium (ICCRTS)
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)

Artificial Intelligence (AI) is rapidly becoming integrated into military Command and Control (C2) systems as a strategic priority for many defence forces. The successful implementation of AI is promising to herald a significant leap in C2 agility through automation. However, realistic expectations need to be set on what AI can achieve in the foreseeable future. This paper will argue that AI could lead to a fragility trap, whereby the delegation of C2 functions to an AI could increase the fragility of C2, resulting in catastrophic strategic failures. This calls for a new framework for AI in C2 to avoid this trap. We will argue that antifragility along with agility should form the core design principles for AI-enabled C2 systems. This duality is termed Agile, Antifragile, AI-Enabled Command and Control (A3IC2). An A3IC2 system continuously improves its capacity to perform in the face of shocks and surprises through overcompensation from feedback during the C2 decision-making cycle. An A3IC2 system will not only be able to survive within a complex operational environment, it will also thrive, benefiting from the inevitable shocks and volatility of war.

[3]  arXiv:2109.06875 [pdf, other]
Title: Multi-Scale Aligned Distillation for Low-Resolution Detection
Comments: In CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In instance-level detection tasks (e.g., object detection), reducing input resolution is an easy option to improve runtime efficiency. However, this option traditionally hurts the detection performance much. This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model. We first identify the challenge of applying knowledge distillation (KD) to teacher and student networks that act on different input resolutions. To tackle it, we explore the idea of spatially aligning feature maps between models of varying input resolutions by shifting feature pyramid positions and introduce aligned multi-scale training to train a multi-scale teacher that can distill its knowledge to a low-resolution student. Further, we propose crossing feature-level fusion to dynamically fuse teacher's multi-resolution features to guide the student better. On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training, while outperforming the latter's low-resolution models by 2.1% to 3.6% in terms of mAP. Our code is made publicly available at https://github.com/dvlab-research/MSAD.

[4]  arXiv:2109.06896 [pdf, other]
Title: Decision-Focused Summarization
Comments: 16 pages, 10 figures, EMNLP 2021, code is available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Relevance in summarization is typically defined based on textual information alone, without incorporating insights about a particular decision. As a result, to support risk analysis of pancreatic cancer, summaries of medical notes may include irrelevant information such as a knee injury. We propose a novel problem, decision-focused summarization, where the goal is to summarize relevant information for a decision. We leverage a predictive model that makes the decision based on the full text to provide valuable insights on how a decision can be inferred from text. To build a summary, we then select representative sentences that lead to similar model decisions as using the full text while accounting for textual non-redundancy. To evaluate our method (DecSum), we build a testbed where the task is to summarize the first ten reviews of a restaurant in support of predicting its future rating on Yelp. DecSum substantially outperforms text-only summarization methods and model-based explanation methods in decision faithfulness and representativeness. We further demonstrate that DecSum is the only method that enables humans to outperform random chance in predicting which restaurant will be better rated in the future.

[5]  arXiv:2109.06906 [pdf]
Title: Recovering individual emotional states from sparse ratings using collaborative filtering
Comments: 21 pages, 8 figures
Subjects: Information Retrieval (cs.IR)

A fundamental challenge in emotion research is measuring feeling states with high granularity and temporal precision without disrupting the emotion generation process. Here we introduce and validate a new approach in which responses are sparsely sampled and the missing data are recovered using a computational technique known as collaborative filtering (CF). This approach leverages structured covariation across individual experiences and is available in Neighbors, an open-source Python toolbox. We validate our approach across three different experimental contexts by recovering dense individual ratings using only a small subset of the original data. In dataset 1, participants (n=316) separately rated 112 emotional images on 6 different discrete emotions. In dataset 2, participants (n=203) watched 8 short emotionally engaging autobiographical stories while simultaneously providing moment-by-moment ratings of the intensity of their affective experience. In dataset 3, participants (n=60) with distinct social preferences made 76 decisions about how much money to return in a hidden multiplier trust game. Across all experimental contexts, CF was able to accurately recover missing data and importantly outperformed mean imputation, particularly in contexts with greater individual variability. This approach will enable new avenues for affective science research by allowing researchers to acquire high dimensional ratings from emotional experiences with minimal disruption to the emotion-generation process.

[6]  arXiv:2109.06907 [pdf, other]
Title: Shape-adaptive Hysteresis Compensation for Tendon-driven Continuum Manipulators
Subjects: Robotics (cs.RO)

Tendon-driven continuum manipulators (TDCM) are commonly used in minimally invasive surgical systems due to their long, thin, flexible structure that is compliant in narrow or tortuous environments. There exist many researches for precise tip control of the articulating section. However, these models do not account for the proximal shaft shape of TDCM, affecting the tip controls in practical settings. In this paper, we propose a gradient-based shift detection method based on motor current that can easily find the offset of task space models (i.e., hysteresis). We analyze our proposed methods with multiple Intra-cardiac Echocardiography catheters, which are typical commercial example of TDCM. Our results show that the errors from varied proximal shape are considerably reduced, and the accuracy of the tip manipulation is improved when changing external environmental structures.

[7]  arXiv:2109.06919 [pdf, ps, other]
Title: Deploying clinical machine learning? Consider the following...
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

Despite the intense attention and investment into clinical machine learning (CML) research, relatively few applications convert to clinical practice. While research is important in advancing the state-of-the-art, translation is equally important in bringing these technologies into a position to ultimately impact patient care and live up to extensive expectations surrounding AI in healthcare. To better characterize a holistic perspective among researchers and practitioners, we survey several participants with experience in developing CML for clinical deployment about their learned experiences. We collate these insights and identify several main categories of barriers and pitfalls in order to better design and develop clinical machine learning applications.

[8]  arXiv:2109.06922 [pdf, other]
Title: Application of integral invariants to apictorial jigsaw puzzle assembly
Comments: 21 pages
Subjects: Numerical Analysis (math.NA); Computational Geometry (cs.CG)

We present a method for the automatic assembly of apictorial jigsaw puzzles. This method relies on integral area invariants for shape matching and an optimization process to aggregate shape matches into a final puzzle assembly. Assumptions about individual piece shape or arrangement are not necessary. We illustrate our method by solving example puzzles of various shapes and sizes.

[9]  arXiv:2109.06923 [pdf, other]
Title: Research Project 2: Drone-supported AI-based Generation of 3D Maps of Indoor Radio Environments
Authors: Ken Mendes
Comments: 7 pages, 9 figures
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

A Radio Environment Map (REM) is a powerful tool in enhancing the experience of radio-enabled agents but building such a REM can be a laborious undertaking, especially in three dimensions. This project shows how such a REM of an indoor three-dimensional space can be generated in an autonomous and scalable way. Building on the results of the preceding Research Project 1, multiple drones are used to map the WiFi signals present in such a space in a real-world environment where the drones are each able to visit 36 waypoints and collectively gather thousands of WiFi beacon data samples. This report also includes an analysis of the collected data and concludes by proposing machine-learning based techniques to predict the signal strength of known access points in locations not visited by the drones.

[10]  arXiv:2109.06926 [pdf, other]
Title: A trainable monogenic ConvNet layer robust in front of large contrast changes in image classification
Comments: "For associated code, see this https URL"
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Convolutional Neural Networks (ConvNets) at present achieve remarkable performance in image classification tasks. However, current ConvNets cannot guarantee the capabilities of the mammalian visual systems such as invariance to contrast and illumination changes. Some ideas to overcome the illumination and contrast variations usually have to be tuned manually and tend to fail when tested with other types of data degradation. In this context, we present a new bio-inspired {entry} layer, M6, which detects low-level geometric features (lines, edges, and orientations) which are similar to patterns detected by the V1 visual cortex. This new trainable layer is capable of coping with image classification even with large contrast variations. The explanation for this behavior is the monogenic signal geometry, which represents each pixel value in a 3D space using quaternions, a fact that confers a degree of explainability to the networks. We compare M6 with a conventional convolutional layer (C) and a deterministic quaternion local phase layer (Q9). The experimental setup {is designed to evaluate the robustness} of our M6 enriched ConvNet model and includes three architectures, four datasets, three types of contrast degradation (including non-uniform haze degradations). The numerical results reveal that the models with M6 are the most robust in front of any kind of contrast variations. This amounts to a significant enhancement of the C models, which usually have reasonably good performance only when the same training and test degradation are used, except for the case of maximum degradation. Moreover, the Structural Similarity Index Measure (SSIM) is used to analyze and explain the robustness effect of the M6 feature maps under any kind of contrast degradations.

[11]  arXiv:2109.06931 [pdf, ps, other]
Title: Measurement and Analysis of GPU-accelerated Applications with HPCToolkit
Journal-ref: Parallel Computing 2021
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

To address the challenge of performance analysis on the US DOE's forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To measure GPU-accelerated applications efficiently, HPCToolkit employs a novel wait-free data structure to coordinate monitoring and attribution of GPU performance. To help developers understand the performance of complex GPU code generated from high-level programming models, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling and instrumentation to measure and attribute GPU performance metrics to source lines, loops, and inlined code. To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples. We illustrate HPCToolkit's new capabilities for analyzing GPU-accelerated applications with several codes developed as part of the Exascale Computing Project.

[12]  arXiv:2109.06932 [pdf, other]
Title: A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence
Comments: 6 pages, 2 figures
Journal-ref: 2019 IEEE World Congress on Services (SERVICES)
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness.

[13]  arXiv:2109.06935 [pdf, other]
Title: On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning
Comments: 22 pages, 6 figures, 5 tables, to appear in BlackBoxNLP 2021
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two components: a language-specific and a language-neutral one. This paper analyses the relationship between them, in the context of fine-tuning on two tasks -- POS tagging and natural language inference -- which require the model to bring to bear different degrees of language-specific knowledge. Visualisations reveal that mBERT loses the ability to cluster representations by language after fine-tuning, a result that is supported by evidence from language identification experiments. However, further experiments on 'unlearning' language-specific representations using gradient reversal and iterative adversarial learning are shown not to add further improvement to the language-independent component over and above the effect of fine-tuning. The results presented here suggest that the process of fine-tuning causes a reorganisation of the model's limited representational capacity, enhancing language-independent representations at the expense of language-specific ones.

[14]  arXiv:2109.06939 [pdf, other]
Title: The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders
Authors: Han He, Jinho D. Choi
Comments: Accepted to EMNLP 2021: The 2021 Conference on Empirical Methods in Natural Language Processing
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task learning. We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL, who interfere with one another to fine-tune those heads for their own objectives. Based on this finding, we propose the Stem Cell Hypothesis to reveal the existence of attention heads naturally talented for many tasks that cannot be jointly trained to create adequate embeddings for all of those tasks. Finally, we design novel parameter-free probes to justify our hypothesis and demonstrate how attention heads are transformed across the five tasks during MTL through label analysis.

[15]  arXiv:2109.06941 [pdf, ps, other]
Title: Monotone Complexity of Spanning Tree Polynomial Re-visited
Comments: 20 pages, 3 figures
Subjects: Computational Complexity (cs.CC)

We prove two results that shed new light on the monotone complexity of the spanning tree polynomial, a classic polynomial in algebraic complexity and beyond.
First, we show that the spanning tree polynomials having $n$ variables and defined over constant-degree expander graphs, have monotone arithmetic complexity $2^{\Omega(n)}$. This yields the first strongly exponential lower bound on the monotone arithmetic circuit complexity for a polynomial in VP. Before this result, strongly exponential size monotone lower bounds were known only for explicit polynomials in VNP (Gashkov-Sergeev'12, Raz-Yehudayoff'11, Srinivasan'20, Cavalar-Kumar-Rossman'20, Hrubes-Yehudayoff'21).
Recently, Hrubes'20 initiated a program to prove lower bounds against general arithmetic circuits by proving $\epsilon$-sensitive lower bounds for monotone arithmetic circuits for a specific range of values for $\epsilon \in (0,1)$. We consider the spanning tree polynomial $ST_{n}$ defined over the complete graph on $n$ vertices and show that the polynomials $F_{n-1,n} - \epsilon \cdot ST_{n}$ and $F_{n-1,n} + \epsilon \cdot ST_{n}$ defined over $n^2$ variables, have monotone circuit complexity $2^{\Omega(n)}$ if $\epsilon \geq 2^{-\Omega(n)}$ and $F_{n-1,n} = \prod_{i=2}^n (x_{i,1} +\cdots + x_{i,n})$ is the complete set-multilinear polynomial. This provides the first $\epsilon$-sensitive exponential lower bound for a family of polynomials inside VP. En-route, we consider a problem in 2-party, best partition communication complexity of deciding whether two sets of oriented edges distributed among Alice and Bob form a spanning tree or not. We prove that there exists a fixed distribution, under which the problem has low discrepancy with respect to every nearly-balanced partition. This result could be of interest beyond algebraic complexity.

[16]  arXiv:2109.06944 [pdf, other]
Title: Minimum Path Star Topology Algorithms for Weighted Regions and Obstacles
Comments: 26 pages, 10 figures
Subjects: Data Structures and Algorithms (cs.DS)

Shortest path algorithms have played a key role in the past century, paving the way for modern day GPS systems to find optimal routes along static systems in fractions of a second. One application of these algorithms includes optimizing the total distance of power lines (specifically in star topological configurations). Due to the relevancy of discovering well-connected electrical systems in certain areas, finding a minimum path that is able to account for geological features would have far-reaching consequences in lowering the cost of electric power transmission. We initialize our research by proving the convex hull as an effective bounding mechanism for star topological minimum path algorithms. Building off this bounding, we propose novel algorithms to manage certain cases that lack existing methods (weighted regions and obstacles) by discretizing Euclidean space into squares and combining pre-existing algorithms that calculate local minimums that we believe have a possibility of being the absolute minimum. We further designate ways to evaluate iterations necessary to reach some level of accuracy. Both of these novel algorithms fulfill certain niches that past literature does not cover.

[17]  arXiv:2109.06950 [pdf, other]
Title: Automatically Exposing Problems with Neural Dialog Models
Authors: Dian Yu, Kenji Sagae
Journal-ref: EMNLP 2021
Subjects: Computation and Language (cs.CL)

Neural dialog models are known to suffer from problems such as generating unsafe and inconsistent responses. Even though these problems are crucial and prevalent, they are mostly manually identified by model designers through interactions. Recently, some research instructs crowdworkers to goad the bots into triggering such problems. However, humans leverage superficial clues such as hate speech, while leaving systematic problems undercover. In this paper, we propose two methods including reinforcement learning to automatically trigger a dialog model into generating problematic responses. We show the effect of our methods in exposing safety and contradiction issues with state-of-the-art dialog models.

[18]  arXiv:2109.06952 [pdf, other]
Title: Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech
Comments: Accepted to EMNLP 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Automatic Speech Recognition (ASR) systems are often optimized to work best for speakers with canonical speech patterns. Unfortunately, these systems perform poorly when tested on atypical speech and heavily accented speech. It has previously been shown that personalization through model fine-tuning substantially improves performance. However, maintaining such large models per speaker is costly and difficult to scale. We show that by adding a relatively small number of extra parameters to the encoder layers via so-called residual adapter, we can achieve similar adaptation gains compared to model fine-tuning, while only updating a tiny fraction (less than 0.5%) of the model parameters. We demonstrate this on two speech adaptation tasks (atypical and accented speech) and for two state-of-the-art ASR architectures.

[19]  arXiv:2109.06956 [pdf, other]
Title: A fast, high-order numerical method for the simulation of single-excitation states in quantum optics
Subjects: Numerical Analysis (math.NA); Quantum Physics (quant-ph)

We consider the numerical solution of a nonlocal partial differential equation which models the process of collective spontaneous emission in a two-level atomic system containing a single photon. We reformulate the problem as an integro-differential equation for the atomic degrees of freedom, and describe an efficient solver for the case of a Gaussian atomic density. The problem of history dependence arising from the integral formulation is addressed using sum-of-exponentials history compression. We demonstrate the solver on two systems of physical interest: in the first, an initially-excited atom decays into a photon by spontaneous emission, and in the second, a photon pulse is used to an excite an atom, which then decays.

[20]  arXiv:2109.06958 [pdf, other]
Title: Probabilistic Analysis of Euclidean Capacitated Vehicle Routing
Subjects: Data Structures and Algorithms (cs.DS)

We give a probabilistic analysis of the unit-demand Euclidean capacitated vehicle routing problem in the random setting, where the input distribution consists of $n$ unit-demand customers modeled as independent, identically distributed uniform random points in the two-dimensional plane. The objective is to visit every customer using a set of routes of minimum total length, such that each route visits at most $k$ customers, where $k$ is the capacity of a vehicle. All of the following results are in the random setting and hold asymptotically almost surely.
The best known polynomial-time approximation for this problem is the iterated tour partitioning (ITP) algorithm, introduced in 1985 by Haimovich and Rinnooy Kan. They showed that the ITP algorithm is near-optimal when $k$ is either $o(\sqrt{n})$ or $\omega(\sqrt{n})$, and they asked whether the ITP algorithm was also effective in the intermediate range. In this work, we show that when $k=\sqrt{n}$, the ITP algorithm is at best a $(1+c_0)$-approximation for some positive constant $c_0$.
On the other hand, the approximation ratio of the ITP algorithm was known to be at most $0.995+\alpha$ due to Bompadre, Dror, and Orlin, where $\alpha$ is the approximation ratio of an algorithm for the traveling salesman problem. In this work, we improve the upper bound on the approximation ratio of the ITP algorithm to $0.915+\alpha$. Our analysis is based on a new lower bound on the optimal cost for the metric capacitated vehicle routing problem, which may be of independent interest.

[21]  arXiv:2109.06961 [pdf, other]
Title: Building Accurate Simple Models with Multihop
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Knowledge transfer from a complex high performing model to a simpler and potentially low performing one in order to enhance its performance has been of great interest over the last few years as it finds applications in important problems such as explainable artificial intelligence, model compression, robust model building and learning from small data. Known approaches to this problem (viz. Knowledge Distillation, Model compression, ProfWeight, etc.) typically transfer information directly (i.e. in a single/one hop) from the complex model to the chosen simple model through schemes that modify the target or reweight training examples on which the simple model is trained. In this paper, we propose a meta-approach where we transfer information from the complex model to the simple model by dynamically selecting and/or constructing a sequence of intermediate models of decreasing complexity that are less intricate than the original complex model. Our approach can transfer information between consecutive models in the sequence using any of the previously mentioned approaches as well as work in 1-hop fashion, thus generalizing these approaches. In the experiments on real data, we observe that we get consistent gains for different choices of models over 1-hop, which on average is more than 2\% and reaches up to 8\% in a particular case. We also empirically analyze conditions under which the multi-hop approach is likely to be beneficial over the traditional 1-hop approach, and report other interesting insights. To the best of our knowledge, this is the first work that proposes such a multi-hop approach to perform knowledge transfer given a single high performing complex model, making it in our opinion, an important methodological contribution.

[22]  arXiv:2109.06966 [pdf, other]
Title: Searching for More Efficient Dynamic Programs
Subjects: Computation and Language (cs.CL)

Computational models of human language often involve combinatorial problems. For instance, a probabilistic parser may marginalize over exponentially many trees to make predictions. Algorithms for such problems often employ dynamic programming and are not always unique. Finding one with optimal asymptotic runtime can be unintuitive, time-consuming, and error-prone. Our work aims to automate this laborious process. Given an initial correct declarative program, we search for a sequence of semantics-preserving transformations to improve its running time as much as possible. To this end, we describe a set of program transformations, a simple metric for assessing the efficiency of a transformed program, and a heuristic search procedure to improve this metric. We show that in practice, automated search -- like the mental search performed by human programmers -- can find substantial improvements to the initial program. Empirically, we show that many common speed-ups described in the NLP literature could have been discovered automatically by our system.

[23]  arXiv:2109.06967 [pdf, other]
Title: Grounding-aware RRT* for Path Planning and Safe Navigation of Marine Crafts in Confined Waters
Comments: Accepted for publication at the 13th IFAC Conference on Control Applications in Marine Systems, Robotics, and Vehicles, 2021 (CAMS)
Subjects: Social and Information Networks (cs.SI); Systems and Control (eess.SY)

The paper presents a path planning algorithm based on RRT* that addresses the risk of grounding during evasive manoeuvres to avoid collision. The planner achieves this objective by integrating a collective navigation experience with the systematic use of water depth information from the electronic navigational chart. Multivariate kernel density estimation is applied to historical AIS data to generate a probabilistic model describing seafarer's best practices while sailing in confined waters. This knowledge is then encoded into the RRT* cost function to penalize path deviations that would lead own ship to sail in shallow waters. Depth contours satisfying the own ship draught define the actual navigable area, and triangulation of this non-convex region is adopted to enable uniform sampling. This ensures the optimal path deviation.

[24]  arXiv:2109.06969 [pdf]
Title: Multi-modal Wound Classification using Wound Image and Location by Deep Neural Network
Comments: 30 pages, 10 figures, 15 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Wound classification is an essential step of wound diagnosis. An efficient classifier can assist wound specialists in classifying wound types with less financial and time costs and help them decide an optimal treatment procedure. This study developed a deep neural network-based multi-modal classifier using wound images and their corresponding locations to categorize wound images into multiple classes, including diabetic, pressure, surgical, and venous ulcers. A body map is also developed to prepare the location data, which can help wound specialists tag wound locations more efficiently. Three datasets containing images and their corresponding location information are designed with the help of wound specialists. The multi-modal network is developed by concatenating the image-based and location-based classifier's outputs with some other modifications. The maximum accuracy on mixed-class classifications (containing background and normal skin) varies from 77.33% to 100% on different experiments. The maximum accuracy on wound-class classifications (containing only diabetic, pressure, surgical, and venous) varies from 72.95% to 98.08% on different experiments. The proposed multi-modal network also shows a significant improvement in results from the previous works of literature.

[25]  arXiv:2109.06974 [pdf, ps, other]
Title: Algorithmic Auditing and Social Justice: Lessons from the History of Audit Studies
Subjects: Computers and Society (cs.CY)

Algorithmic audits have been embraced as tools to investigate the functioning and consequences of sociotechnical systems. Though the term is used somewhat loosely in the algorithmic context and encompasses a variety of methods, it maintains a close connection to audit studies in the social sciences--which have, for decades, used experimental methods to measure the prevalence of discrimination across domains like housing and employment. In the social sciences, audit studies originated in a strong tradition of social justice and participatory action, often involving collaboration between researchers and communities; but scholars have argued that, over time, social science audits have become somewhat distanced from these original goals and priorities. We draw from this history in order to highlight difficult tensions that have shaped the development of social science audits, and to assess their implications in the context of algorithmic auditing. In doing so, we put forth considerations to assist in the development of robust and engaged assessments of sociotechnical systems that draw from auditing's roots in racial equity and social justice.

[26]  arXiv:2109.06976 [pdf, other]
Title: GRiD: GPU-Accelerated Rigid Body Dynamics with Analytical Gradients
Comments: 8 pages, 5 figures, 1 data table
Subjects: Robotics (cs.RO)

We introduce GRiD: a GPU-accelerated library for computing rigid body dynamics with analytical gradients. GRiD was designed to accelerate the nonlinear trajectory optimization subproblem used in state-of-the-art robotic planning, control, and machine learning. Each iteration of nonlinear trajectory optimization requires tens to hundreds of naturally parallel computations of rigid body dynamics and their gradients. GRiD leverages URDF parsing and code generation to deliver optimized dynamics kernels that not only expose GPU-friendly computational patterns, but also take advantage of both fine-grained parallelism within each computation and coarse-grained parallelism between computations. Through this approach, when performing multiple computations of rigid body dynamics algorithms, GRiD provides as much as a 7.6x speedup over a state-of-the-art, multi-threaded CPU implementation, and maintains as much as a 2.6x speedup when accounting for I/O overhead. We release GRiD as an open-source library, so that it can be leveraged by the robotics community to easily and efficiently accelerate rigid body dynamics on the GPU.

[27]  arXiv:2109.06977 [pdf, other]
Title: Design Guidelines for Prompt Engineering Text-to-Image Generative Models
Subjects: Human-Computer Interaction (cs.HC)

Text-to-image generative models are a new and powerful way to generate visual artwork. The free-form nature of text as interaction is double-edged; while users have access to an infinite range of generations, they also must engage in brute-force trial and error with the text prompt when the result quality is poor. We conduct a study exploring what prompt components and model parameters can help produce coherent outputs. In particular, we study prompts structured to include subject and style and investigate success and failure modes within these dimensions. Our evaluation of 5493 generations over the course of five experiments spans 49 abstract and concrete subjects as well as 51 abstract and figurative styles. From this evaluation, we present design guidelines that can help people find better outcomes from text-to-image generative models.

[28]  arXiv:2109.06978 [pdf, ps, other]
Title: Event-Triggered Distributed Stabilization of Interconnected Multiagent Systems with Abnormal Agent and Control Layers: Theoretical Analysis
Authors: Vahid Rezaei
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)

A graph theoretic framework recently has been proposed to stabilize interconnected multiagent systems in a distributed fashion, while systematically capturing the architectural aspect of cyber-physical systems with separate agent or physical layer and control or cyber layer. Based on that development, in addition to the modeling uncertainties over the agent layer, we consider a scenario where the control layer is subject to the denial of service attacks. We propose a step-by-step procedure to design a control layer that, in the presence of the aforementioned abnormalities, guarantees a level of robustness and resiliency for the final two-layer interconnected multiagent system. The incorporation of an event-triggered strategy further ensures an effective use of the limited energy and communication resources over the control layer. We theoretically prove the resilient, robust, and Zeno-free convergence of all state trajectories to the origin and, via a simulation study, discuss the feasibility of the proposed ideas.

[29]  arXiv:2109.06979 [pdf, other]
Title: CORNET 2.0: A Co-Simulation Middleware forRobot Networks
Subjects: Robotics (cs.RO)

We present a networked co-simulation framework for multi-robot systems applications. We require a simulation framework that captures both physical interactions and communications aspects to effectively design such complex systems. This is necessary to co-design the multi-robots' autonomy logic and the communication protocols. The proposed framework extends existing tools to simulate the robot's autonomy and network-related aspects. We have used Gazebo with ROS/ROS2 to develop the autonomy logic for robots and mininet-WiFi as the network simulator to capture the cyber-physical systems properties of the multi-robot system. This framework addresses the need to seamlessly integrate the two simulation environments by synchronizing mobility and time, allowing for easy migration of the algorithms to real platforms.

[30]  arXiv:2109.06980 [pdf, other]
Title: Explainable Identification of Dementia from Transcripts using Transformer Networks
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)

Alzheimer's disease (AD) is the main cause of dementia which is accompanied by loss of memory and may lead to severe consequences in peoples' everyday life if not diagnosed on time. Very few works have exploited transformer-based networks and despite the high accuracy achieved, little work has been done in terms of model interpretability. In addition, although Mini-Mental State Exam (MMSE) scores are inextricably linked with the identification of dementia, research works face the task of dementia identification and the task of the prediction of MMSE scores as two separate tasks. In order to address these limitations, we employ several transformer-based models, with BERT achieving the highest accuracy accounting for 85.56%. Concurrently, we propose an interpretable method to detect AD patients based on siamese networks reaching accuracy up to 81.18%. Next, we introduce two multi-task learning models, where the main task refers to the identification of dementia (binary classification), while the auxiliary one corresponds to the identification of the severity of dementia (multiclass classification). Our model obtains accuracy equal to 84.99% on the detection of AD patients in the multi-task learning setting. Finally, we present some new methods to identify the linguistic patterns used by AD patients and non-AD ones, including text statistics, vocabulary uniqueness, word usage, correlations via a detailed linguistic analysis, and explainability techniques (LIME). Findings indicate significant differences in language between AD and non-AD patients.

[31]  arXiv:2109.06982 [pdf, other]
Title: Generalized Multivariable Grid-Forming Control Design for Power Converters
Subjects: Systems and Control (eess.SY)

The grid-forming converter is an important unit in the future power system with more inverter-interfaced generators. However, improving its performance is still a key challenge. This paper proposes a generalized architecture of the grid-forming converter from the view of multivariable feedback control. As a result, many of the existing popular control strategies, i.e., droop control, power synchronization control, virtual synchronous generator control, matching control, dispatchable virtual oscillator control, and their improved forms are unified into a multivariable feedback control transfer matrix working on several linear and nonlinear error signals. Meanwhile, unlike the traditional assumptions of decoupling between AC and DC control, active power and reactive power control, the proposed configuration simultaneously takes all of them into consideration, which therefore can provide better performance. As an example, a new multi-input-multi-output-based grid-forming (MIMO-GFM) control is proposed based on the generalized configuration. To cope with the multivariable feedback, an optimal and structured $H_{\infty}$ synthesis is used to design the control parameters. At last, simulation and experimental results show superior performance and robustness of the proposed configuration and control.

[32]  arXiv:2109.06987 [pdf, other]
Title: NOPE: A Corpus of Naturally-Occurring Presuppositions in English
Comments: CoNLL 2021. Data and code available at this https URL
Subjects: Computation and Language (cs.CL)

Understanding language requires grasping not only the overtly stated content, but also making inferences about things that were left unsaid. These inferences include presuppositions, a phenomenon by which a listener learns about new information through reasoning about what a speaker takes as given. Presuppositions require complex understanding of the lexical and syntactic properties that trigger them as well as the broader conversational context. In this work, we introduce the Naturally-Occurring Presuppositions in English (NOPE) Corpus to investigate the context-sensitivity of 10 different types of presupposition triggers and to evaluate machine learning models' ability to predict human inferences. We find that most of the triggers we investigate exhibit moderate variability. We further find that transformer-based models draw correct inferences in simple cases involving presuppositions, but they fail to capture the minority of exceptional cases in which human judgments reveal complex interactions between context and triggers.

[33]  arXiv:2109.06990 [pdf, other]
Title: Personalization, Privacy, and Me
Comments: ACM CCS Concepts: Information systems~Recommender systems, Information systems~Personalization, Security and privacy~Human and societal aspects of security and privacy, General and reference~Surveys and overviews. Keywords: Personalization, Privacy, Survey
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

News recommendation and personalization is not a solved problem. People are growing concerned of their data being collected in excess in the name of personalization and the usage of it for purposes other than the ones they would think reasonable. Our experience in building personalization products for publishers while adhering to safeguard user privacy led us to investigate more on the user perspective of privacy and personalization. We conducted a survey to explore people's experience with personalization and privacy and the viewpoints of different age groups. In this paper, we share our major findings with publishers and the community that can inform algorithmic design and implementation of the next generation of news recommender systems, which must put the human at its core and reach a balance between personalization experiences and privacy to reap the benefits of both.

[34]  arXiv:2109.06992 [pdf, other]
Title: ML-aided power allocation for Tactical MIMO
Comments: Under review at MILCOM 2021
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We study the problem of optimal power allocation in single-hop multi-antenna ad-hoc wireless networks. A standard technique to solve this problem involves optimizing a tri-convex function under power constraints using a block-coordinate-descent (BCD) based iterative algorithm. This approach, termed WMMSE, tends to be computationally complex and time consuming. Several learning-based approaches have been proposed to speed up the power allocation process. A recent work, UWMMSE, learns an affine transformation of a WMMSE parameter in an unfolded structure to accelerate convergence. In spite of achieving promising results, its application is limited to single-antenna wireless networks. In this work, we present a UWMMSE framework for power allocation in (multiple-input multiple-output) MIMO interference networks. Through an empirical study, we illustrate the superiority of our approach in comparison to WMMSE and also analyze its robustness to changes in channel conditions and network size.

[35]  arXiv:2109.06998 [pdf, other]
Title: $\mathcal{L}_1$ Adaptive Augmentation for Geometric Tracking Control of Quadrotors
Subjects: Systems and Control (eess.SY)

This paper introduces an $\mathcal{L}_1$ adaptive control augmentation for geometric tracking control of quadrotors. In the proposed design, the $\mathcal{L}_1$ augmentation handles nonlinear (time- and state-dependent) uncertainties in the quadrotor dynamics without assuming/enforcing parametric structures, while the baseline geometric controller achieves stabilization of the known nonlinear model of the system dynamics. The $\mathcal{L}_1$ augmentation applies to both the rotational and the translational dynamics. Experimental results demonstrate that the augmented geometric controller shows consistent and (on average five times) smaller trajectory tracking errors compared with the geometric controller alone when tested for different trajectories and under various types of uncertainties/disturbances.

[36]  arXiv:2109.06999 [pdf, ps, other]
Title: Behavior of k-NN as an Instance-Based Explanation Method
Subjects: Machine Learning (cs.LG)

Adoption of DL models in critical areas has led to an escalating demand for sound explanation methods. Instance-based explanation methods are a popular type that return selective instances from the training set to explain the predictions for a test sample. One way to connect these explanations with prediction is to ask the following counterfactual question - how does the loss and prediction for a test sample change when explanations are removed from the training set? Our paper answers this question for k-NNs which are natural contenders for an instance-based explanation method. We first demonstrate empirically that the representation space induced by last layer of a neural network is the best to perform k-NN in. Using this layer, we conduct our experiments and compare them to influence functions (IFs) ~\cite{koh2017understanding} which try to answer a similar question. Our evaluations do indicate change in loss and predictions when explanations are removed but we do not find a trend between $k$ and loss or prediction change. We find significant stability in the predictions and loss of MNIST vs. CIFAR-10. Surprisingly, we do not observe much difference in the behavior of k-NNs vs. IFs on this question. We attribute this to training set subsampling for IFs.

[37]  arXiv:2109.07000 [pdf, other]
Title: Koopman Linearization for Data-Driven Batch State Estimation of Control-Affine Systems
Comments: 9 pages, 5 figures, 1 table. Submitted to IEEE RA-L. Note: version submitted to IEEE RA-L did not include the Appendix section present in this arXiv version
Subjects: Robotics (cs.RO)

We present the Koopman State Estimator (KoopSE), a framework for model-free batch state estimation of control-affine systems that makes no linearization assumptions, requires no problem-specific feature selections, and has an inference computational cost that is independent of the number of training points. We lift the original nonlinear system into a higher-dimensional Reproducing Kernel Hilbert Space (RKHS), where the system becomes bilinear. The time-invariant model matrices can be learned by solving a least-squares problem on training trajectories. At test time, the system is algebraically manipulated into a linear time-varying system, where standard batch linear state estimation techniques can be used to efficiently compute state means and covariances. Random Fourier Features (RFF) are used to combine the computational efficiency of Koopman-based methods and the generality of kernel-embedding methods. KoopSE is validated experimentally on a localization task involving a mobile robot equipped with ultra-wideband receivers and wheel odometry. KoopSE estimates are more accurate and consistent than the standard model-based extended Rauch-Tung-Striebel (RTS) smoother, despite KoopSE having no prior knowledge of the system's motion or measurement models.

[38]  arXiv:2109.07001 [pdf, other]
Title: ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors
Comments: Accepted at ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image-based virtual try-on involves synthesizing perceptually convincing images of a model wearing a particular garment and has garnered significant research interest due to its immense practical applicability. Recent methods involve a two stage process: i) warping of the garment to align with the model ii) texture fusion of the warped garment and target model to generate the try-on output. Issues arise due to the non-rigid nature of garments and the lack of geometric information about the model or the garment. It often results in improper rendering of granular details. We propose ZFlow, an end-to-end framework, which seeks to alleviate these concerns regarding geometric and textural integrity (such as pose, depth-ordering, skin and neckline reproduction) through a combination of gated aggregation of hierarchical flow estimates termed Gated Appearance Flow, and dense structural priors at various stage of the network. ZFlow achieves state-of-the-art results as observed qualitatively, and on quantitative benchmarks of image quality (PSNR, SSIM, and FID). The paper presents extensive comparisons with other existing solutions including a detailed user study and ablation studies to gauge the effect of each of our contributions on multiple datasets.

[39]  arXiv:2109.07006 [pdf, other]
Title: A Three Step Training Approach with Data Augmentation for Morphological Inflection
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present the BME submission for the SIGMORPHON 2021 Task 0 Part 1, Generalization Across Typologically Diverse Languages shared task. We use an LSTM encoder-decoder model with three step training that is first trained on all languages, then fine-tuned on each language families and finally finetuned on individual languages. We use a different type of data augmentation technique in the first two steps. Our system outperformed the only other submission. Although it remains worse than the Transformer baseline released by the organizers, our model is simpler and our data augmentation techniques are easily applicable to new languages. We perform ablation studies and show that the augmentation techniques and the three training steps often help but sometimes have a negative effect.

[40]  arXiv:2109.07008 [pdf, ps, other]
Title: HeMI: Multi-view Embedding in Heterogeneous Graphs
Subjects: Machine Learning (cs.LG)

Many real-world graphs involve different types of nodes and relations between nodes, being heterogeneous by nature. The representation learning of heterogeneous graphs (HGs) embeds the rich structure and semantics of such graphs into a low-dimensional space and facilitates various data mining tasks, such as node classification, node clustering, and link prediction. In this paper, we propose a self-supervised method that learns HG representations by relying on knowledge exchange and discovery among different HG structural semantics (meta-paths). Specifically, by maximizing the mutual information of meta-path representations, we promote meta-path information fusion and consensus, and ensure that globally shared semantics are encoded. By extensive experiments on node classification, node clustering, and link prediction tasks, we show that the proposed self-supervision both outperforms and improves competing methods by 1% and up to 10% for all tasks.

[41]  arXiv:2109.07009 [pdf, other]
Title: Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering
Comments: Accepted at EMNLP 2021 Main Conference (Long)
Subjects: Computation and Language (cs.CL)

In this paper we propose a novel approach towards improving the efficiency of Question Answering (QA) systems by filtering out questions that will not be answered by them. This is based on an interesting new finding: the answer confidence scores of state-of-the-art QA systems can be approximated well by models solely using the input question text. This enables preemptive filtering of questions that are not answered by the system due to their answer confidence scores being lower than the system threshold. Specifically, we learn Transformer-based question models by distilling Transformer-based answering models. Our experiments on three popular QA datasets and one industrial QA benchmark demonstrate the ability of our question models to approximate the Precision/Recall curves of the target QA system well. These question models, when used as filters, can effectively trade off lower computation cost of QA systems for lower Recall, e.g., reducing computation by ~60%, while only losing ~3-4% of Recall.

[42]  arXiv:2109.07012 [pdf, other]
Title: Searching for Representation: A sociotechnical audit of googling for members of U.S. Congress
Subjects: Computers and Society (cs.CY)

High-quality online civic infrastructure is increasingly critical for the success of democratic processes. There is a pervasive reliance on search engines to find facts and information necessary for political participation and oversight. We find that approximately 10\% of the top Google search results are likely to mislead California information seekers who use search to identify their congressional representatives. 70\% of the misleading results appear in featured snippets above the organic search results. We use both qualitative and quantitative methods to understand what aspects of the information ecosystem lead to this sociotechnical breakdown. Factors identified include Google's heavy reliance on Wikipedia, the lack of authoritative, machine parsable, high accuracy data about the identity of elected officials based on geographic location, and the search engine's treatment of under-specified queries. We recommend steps that Google can take to meet its stated commitment to providing high quality civic information, and steps that information providers can take to improve the legibility and quality of information about congressional representatives available to search algorithms.

[43]  arXiv:2109.07016 [pdf, other]
Title: Graph Embedding via Diffusion-Wavelets-Based Node Feature Distribution Characterization
Comments: In CIKM 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

Recent years have seen a rise in the development of representational learning methods for graph data. Most of these methods, however, focus on node-level representation learning at various scales (e.g., microscopic, mesoscopic, and macroscopic node embedding). In comparison, methods for representation learning on whole graphs are currently relatively sparse. In this paper, we propose a novel unsupervised whole graph embedding method. Our method uses spectral graph wavelets to capture topological similarities on each k-hop sub-graph between nodes and uses them to learn embeddings for the whole graph. We evaluate our method against 12 well-known baselines on 4 real-world datasets and show that our method achieves the best performance across all experiments, outperforming the current state-of-the-art by a considerable margin.

[44]  arXiv:2109.07017 [pdf, other]
Title: Written Justifications are Key to Aggregate Crowdsourced Forecasts
Comments: Findings of EMNLP 2021
Subjects: Computation and Language (cs.CL)

This paper demonstrates that aggregating crowdsourced forecasts benefits from modeling the written justifications provided by forecasters. Our experiments show that the majority and weighted vote baselines are competitive, and that the written justifications are beneficial to call a question throughout its life except in the last quarter. We also conduct an error analysis shedding light into the characteristics that make a justification unreliable.

[45]  arXiv:2109.07020 [pdf, other]
Title: Frequency Effects on Syntactic Rule Learning in Transformers
Comments: Camera ready for EMNLP 2021
Subjects: Computation and Language (cs.CL)

Pre-trained language models perform well on a variety of linguistic tasks that require symbolic reasoning, raising the question of whether such models implicitly represent abstract symbols and rules. We investigate this question using the case study of BERT's performance on English subject-verb agreement. Unlike prior work, we train multiple instances of BERT from scratch, allowing us to perform a series of controlled interventions at pre-training time. We show that BERT often generalizes well to subject-verb pairs that never occurred in training, suggesting a degree of rule-governed behavior. We also find, however, that performance is heavily influenced by word frequency, with experiments showing that both the absolute frequency of a verb form, as well as the frequency relative to the alternate inflection, are causally implicated in the predictions BERT makes at inference time. Closer analysis of these frequency effects reveals that BERT's behavior is consistent with a system that correctly applies the SVA rule in general but struggles to overcome strong training priors and to estimate agreement features (singular vs. plural) on infrequent lexical items.

[46]  arXiv:2109.07022 [pdf, other]
Title: How Does Counterfactually Augmented Data Impact Models for Social Computing Constructs?
Comments: Preprint of a paper accepted to EMNLP 2021
Subjects: Computers and Society (cs.CY)

As NLP models are increasingly deployed in socially situated settings such as online abusive content detection, it is crucial to ensure that these models are robust. One way of improving model robustness is to generate counterfactually augmented data (CAD) for training models that can better learn to distinguish between core features and data artifacts. While models trained on this type of data have shown promising out-of-domain generalizability, it is still unclear what the sources of such improvements are. We investigate the benefits of CAD for social NLP models by focusing on three social computing constructs -- sentiment, sexism, and hate speech. Assessing the performance of models trained with and without CAD across different types of datasets, we find that while models trained on CAD show lower in-domain performance, they generalize better out-of-domain. We unpack this apparent discrepancy using machine explanations and find that CAD reduces model reliance on spurious features. Leveraging a novel typology of CAD to analyze their relationship with model performance, we find that CAD which acts on the construct directly or a diverse set of CAD leads to higher performance.

[47]  arXiv:2109.07023 [pdf, other]
Title: Embedding Node Structural Role Identity Using Stress Majorization
Comments: In CIKM 2021
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Nodes in networks may have one or more functions that determine their role in the system. As opposed to local proximity, which captures the local context of nodes, the role identity captures the functional "role" that nodes play in a network, such as being the center of a group, or the bridge between two groups. This means that nodes far apart in a network can have similar structural role identities. Several recent works have explored methods for embedding the roles of nodes in networks. However, these methods all rely on either approximating or indirect modeling of structural equivalence. In this paper, we present a novel and flexible framework using stress majorization, to transform the high-dimensional role identities in networks directly (without approximation or indirect modeling) to a low-dimensional embedding space. Our method is also flexible, in that it does not rely on specific structural similarity definitions. We evaluated our method on the tasks of node classification, clustering, and visualization, using three real-world and five synthetic networks. Our experiments show that our framework achieves superior results than existing methods in learning node role representations.

[48]  arXiv:2109.07024 [pdf, other]
Title: DPMPC-Planner: A real-time UAV trajectory planning framework for complex static environments with dynamic obstacles
Comments: 7pages, 8 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Safe UAV navigation is challenging due to the complex environment structures, dynamic obstacles, and uncertainties from measurement noises and unpredictable moving obstacle behaviors. Although plenty of recent works achieve safe navigation in complex static environments with sophisticated mapping algorithms, such as occupancy map and ESDF map, these methods cannot reliably handle dynamic environments due to the mapping limitation from moving obstacles. To address the limitation, this paper proposes a trajectory planning framework to achieve safe navigation considering complex static environments with dynamic obstacles. To reliably handle dynamic obstacles, we divide the environment representation into static mapping and dynamic object representation, which can be obtained from computer vision methods. Our framework first generates a static trajectory based on the proposed iterative corridor shrinking algorithm. Then, reactive chance-constrained model predictive control with temporal goal tracking is applied to avoid dynamic obstacles with uncertainties. The simulation results in various environments demonstrate the ability of our algorithm to navigate safely in complex static environments with dynamic obstacles.

[49]  arXiv:2109.07025 [pdf, other]
Title: Globally-Attractive Logarithmic Geometric Control of a Quadrotor for Aggressive Trajectory Tracking
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

We present a new quadrotor geometric control scheme that is capable of tracking highly aggressive trajectories. Unlike previous works, our geometric controller uses the logarithmic map of SO(3) to express rotational error in the Lie algebra, allowing us to treat the manifold in a more effective and natural manner, and can be shown to be globally attractive. We show the performance of our control scheme against highly aggressive trajectories in simulation experiments. Additionally, we present an adaptation of this controller that allows us to interface effectively with the angular rate controllers on an onboard flight control unit and show the ability of this adapted control scheme to track aggressive trajectories on a quadrotor hardware platform.

[50]  arXiv:2109.07028 [pdf, other]
Title: Avengers Ensemble! Improving Transferability of Authorship Obfuscation
Comments: Submitted to PETS 2021
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Stylometric approaches have been shown to be quite effective for real-world authorship attribution. To mitigate the privacy threat posed by authorship attribution, researchers have proposed automated authorship obfuscation approaches that aim to conceal the stylometric artefacts that give away the identity of an anonymous document's author. Recent work has focused on authorship obfuscation approaches that rely on black-box access to an attribution classifier to evade attribution while preserving semantics. However, to be useful under a realistic threat model, it is important that these obfuscation approaches work well even when the adversary's attribution classifier is different from the one used internally by the obfuscator. Unfortunately, existing authorship obfuscation approaches do not transfer well to unseen attribution classifiers. In this paper, we propose an ensemble-based approach for transferable authorship obfuscation. Our experiments show that if an obfuscator can evade an ensemble attribution classifier, which is based on multiple base attribution classifiers, it is more likely to transfer to different attribution classifiers. Our analysis shows that ensemble-based authorship obfuscation achieves better transferability because it combines the knowledge from each of the base attribution classifiers by essentially averaging their decision boundaries.

[51]  arXiv:2109.07033 [pdf, other]
Title: An energy-based discontinuous Galerkin method for dynamic Euler-Bernoulli beam equations
Authors: Lu Zhang
Comments: 22
Subjects: Numerical Analysis (math.NA)

In this paper, an energy-based discontinuous Galerkin method for dynamic Euler-Bernoulli beam equations is developed. The resulting method is energy-dissipating or energy-conserving depending on the simple, mesh-independent choice of numerical fluxes. By introducing a velocity field, the original problem is transformed into a first-order in time system. In our formulation, the discontinuous Galerkin approximations for the original displacement field and the auxiliary velocity field are not restricted to be in the same space. In particular, a given accuracy can be achieved with the fewest degrees of freedom when the degree for the approximation space of the velocity field is two orders lower than the degree of approximation space for the displacement field. In addition, we establish the error estimates in an energy norm and demonstrate the corresponding optimal convergence in numerical experiments.

[52]  arXiv:2109.07034 [pdf, other]
Title: A discontinuous Galerkin method for nonlinear biharmonic Schrödinger equations
Authors: Lu Zhang
Comments: 24 pages
Subjects: Numerical Analysis (math.NA)

This paper proposes and analyzes an ultra-weak local discontinuous Galerkin scheme for one-dimensional nonlinear biharmonic Schr\"{o}dinger equations. We develop the paradigm of the local discontinuous Galerkin method by introducing the second-order spatial derivative as an auxiliary variable instead of the conventional first-order derivative. The proposed semi-discrete scheme preserves a few physically relevant properties such as the conservation of mass and the conservation of Hamiltonian accompanied by its stability for the targeted nonlinear biharmonic Schr\"{o}dinger equations. We also derive optimal $L^2$-error estimates of the scheme that measure both the solution and the auxiliary variable. Several numerical studies demonstrate and support our theoretical findings.

[53]  arXiv:2109.07035 [pdf, other]
Title: Data Hunches: Incorporating Personal Knowledge into Visualizations
Subjects: Human-Computer Interaction (cs.HC)

The trouble with data is that often it provides only an imperfect representation of the phenomenon of interest. When reading and interpreting data, personal knowledge about the data plays an important role. Data visualization, however, has neither a concept defining personal knowledge about datasets, nor the methods or tools to robustly integrate them into an analysis process, thus hampering analysts' ability to express their personal knowledge about datasets, and others to learn from such knowledge. In this work, we define such personal knowledge about datasets as data hunches and elevate this knowledge to another form of data that can be externalized, visualized, and used for collaboration. We establish the implications of data hunches and provide a design space for externalizing and communicating data hunches through visualization techniques. We envision such a design space will empower users to externalize their personal knowledge and support the ability to learn from others' data hunches.

[54]  arXiv:2109.07036 [pdf, other]
Title: PnP-DETR: Towards Efficient Visual Analysis with Transformers
Comments: accepted by ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, DETR~\cite{carion2020end} pioneered the solution of vision tasks with transformers, it directly translates the image feature map into the object detection result. Though effective, translating the full feature map can be costly due to redundant computation on some area like the background. In this work, we encapsulate the idea of reducing spatial redundancy into a novel poll and pool (PnP) sampling module, with which we build an end-to-end PnP-DETR architecture that adaptively allocates its computation spatially to be more efficient. Concretely, the PnP module abstracts the image feature map into fine foreground object feature vectors and a small number of coarse background contextual feature vectors. The transformer models information interaction within the fine-coarse feature space and translates the features into the detection result. Moreover, the PnP-augmented model can instantly achieve various desired trade-offs between performance and computation with a single model by varying the sampled feature length, without requiring to train multiple models as existing methods. Thus it offers greater flexibility for deployment in diverse scenarios with varying computation constraint. We further validate the generalizability of the PnP module on \textbf{panoptic segmentation} and the recent transformer-based image recognition model {\textbf{ViT}}~\cite{dosovitskiy2020image} and show consistent efficiency gain. We believe our method makes a step for efficient visual analysis with transformers, wherein spatial redundancy is commonly observed. Code will be available at \url{https://github.com/twangnh/pnp-detr}.

[55]  arXiv:2109.07041 [pdf, ps, other]
Title: Coalition Game based User Association for mmWave Mobile Relay Systems in Rail Traffic Scenarios
Comments: 11 pages, 11 figures
Journal-ref: IEEE Transactions on Vehicular Technology, 2021
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Rail transportation, especially, high-speed rails (HSR), is an important infrastructure for the development of national economy and the promotion of passenger experience. Due to the large bandwidth, millimeter wave (mmWave) communication is regarded as a promising technology to meet the demand of high data rates. However, since mmWave communication has the characteristic of high attenuation, mobile relay (MR) is considered in this paper. Also, full-duplex (FD) communications have been proposed to improve the spectral efficiency. However, because of the high speed, as well as the problem of penetration loss, passengers on the train have a poor quality of service. Consequently, an effective user association scheme for HSR in mmWave band is necessary. In this paper, we investigate the user association optimization problem in mmWave mobilerelay systems where the MRs operate in the FD mode. To maximize the system capacity, we propose a cooperative user association approach based on coalition formation game, and develop a coalition formation algorithm to solve the challenging NP-hard problem. We also prove the convergence and Nashstable property of the proposed algorithm. Extensive simulations are done to show the system performance of the proposed scheme under various network settings. It is demonstrated that the proposed distributed low complexity scheme achieves a nearoptimal performance and outperforms two baseline schemes in terms of average system throughput.

[56]  arXiv:2109.07043 [pdf, other]
Title: Attention Is Indeed All You Need: Semantically Attention-Guided Decoding for Data-to-Text NLG
Comments: Accepted to INLG 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Ever since neural models were adopted in data-to-text language generation, they have invariably been reliant on extrinsic components to improve their semantic accuracy, because the models normally do not exhibit the ability to generate text that reliably mentions all of the information provided in the input. In this paper, we propose a novel decoding method that extracts interpretable information from encoder-decoder models' cross-attention, and uses it to infer which attributes are mentioned in the generated text, which is subsequently used to rescore beam hypotheses. Using this decoding method with T5 and BART, we show on three datasets its ability to dramatically reduce semantic errors in the generated outputs, while maintaining their state-of-the-art quality.

[57]  arXiv:2109.07046 [pdf, other]
Title: A Conditional Generative Matching Model for Multi-lingual Reply Suggestion
Subjects: Computation and Language (cs.CL)

We study the problem of multilingual automated reply suggestions (RS) model serving many languages simultaneously. Multilingual models are often challenged by model capacity and severe data distribution skew across languages. While prior works largely focus on monolingual models, we propose Conditional Generative Matching models (CGM), optimized within a Variational Autoencoder framework to address challenges arising from multi-lingual RS. CGM does so with expressive message conditional priors, mixture densities to enhance multi-lingual data representation, latent alignment for language discrimination, and effective variational optimization techniques for training multi-lingual RS. The enhancements result in performance that exceed competitive baselines in relevance (ROUGE score) by more than 10\% on average, and 16\% for low resource languages. CGM also shows remarkable improvements in diversity (80\%) illustrating its expressiveness in representation of multi-lingual data.

[58]  arXiv:2109.07047 [pdf, other]
Title: The Promise of Dataflow Architectures in the Design of Processing Systems for Autonomous Machines
Comments: Please note that this may be a special case in that Professor Gao sadly passed away on September 12th, just as we had put the finishing touches on this submission
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Robotics (cs.RO)

The commercialization of autonomous machines is a thriving sector, and likely to be the next major computing demand driver, after PC, cloud computing, and mobile computing. Nevertheless, a suitable computer architecture for autonomous machines is missing, and many companies are forced to develop ad hoc computing solutions that are neither scalable nor extensible. In this article, we analyze the demands of autonomous machine computing, and argue for the promise of dataflow architectures in autonomous machines.

[59]  arXiv:2109.07048 [pdf, other]
Title: ARCH: Efficient Adversarial Regularized Training with Caching
Subjects: Computation and Language (cs.CL)

Adversarial regularization can improve model generalization in many natural language processing tasks. However, conventional approaches are computationally expensive since they need to generate a perturbation for each sample in each epoch. We propose a new adversarial regularization method ARCH (adversarial regularization with caching), where perturbations are generated and cached once every several epochs. As caching all the perturbations imposes memory usage concerns, we adopt a K-nearest neighbors-based strategy to tackle this issue. The strategy only requires caching a small amount of perturbations, without introducing additional training time. We evaluate our proposed method on a set of neural machine translation and natural language understanding tasks. We observe that ARCH significantly eases the computational burden (saves up to 70\% of computational time in comparison with conventional approaches). More surprisingly, by reducing the variance of stochastic gradients, ARCH produces a notably better (in most of the tasks) or comparable model generalization. Our code is publicly available.

[60]  arXiv:2109.07049 [pdf, other]
Title: Self-Training with Differentiable Teacher
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Self-training achieves enormous success in various semi-supervised and weakly-supervised learning tasks. The method can be interpreted as a teacher-student framework, where the teacher generates pseudo-labels, and the student makes predictions. The two models are updated alternatingly. However, such a straightforward alternating update rule leads to training instability. This is because a small change in the teacher may result in a significant change in the student. To address this issue, we propose {\ours}, short for differentiable self-training, that treats teacher-student as a Stackelberg game. In this game, a leader is always in a more advantageous position than a follower. In self-training, the student contributes to the prediction performance, and the teacher controls the training process by generating pseudo-labels. Therefore, we treat the student as the leader and the teacher as the follower. The leader procures its advantage by acknowledging the follower's strategy, which involves differentiable pseudo-labels and differentiable sample weights. Consequently, the leader-follower interaction can be effectively captured via Stackelberg gradient, obtained by differentiating the follower's strategy. Experimental results on semi- and weakly-supervised classification and named entity recognition tasks show that our model outperforms existing approaches by large margins.

[61]  arXiv:2109.07050 [pdf, ps, other]
Title: The Elliptic Net Algorithm Revisited
Subjects: Cryptography and Security (cs.CR); Algebraic Geometry (math.AG); Number Theory (math.NT)

Pairings have been widely used since their introduction to cryptography. They can be applied to identity-based encryption, tripartite Diffie-Hellman key agreement, blockchain and other cryptographic schemes. The Acceleration of pairing computations is crucial for these cryptographic schemes or protocols. In this paper, we will focus on the Elliptic Net algorithm which can compute pairings in polynomial time, but it requires more storage than Miller's algorithm. We use several methods to speed up the Elliptic Net algorithm. Firstly, we eliminate the inverse operation in the improved Elliptic Net algorithm. In some circumstance, this finding can achieve further improvements. Secondly, we apply lazy reduction technique to the Elliptic Net algorithm, which helps us achieve a faster implementation. Finally, we propose a new derivation of the formulas for the computation of the Optimal Ate pairing on the twisted curve. Results show that the Elliptic Net algorithm can be significantly accelerated especially on the twisted curve. The algorithm can be $80\%$ faster than the previous ones on the twisted 381-bit BLS12 curve and $71.5\%$ faster on the twisted 676-bit KSS18 curve respectively.

[62]  arXiv:2109.07053 [pdf, other]
Title: Image Synthesis via Semantic Composition
Comments: Project page is at this https URL Accepted to ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we present a novel approach to synthesize realistic images based on their semantic layouts. It hypothesizes that for objects with similar appearance, they share similar representation. Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations. Conditioning on these features, we propose a dynamic weighted network constructed by spatially conditional computation (with both convolution and normalization). More than preserving semantic distinctions, the given dynamic network strengthens semantic relevance, benefiting global structure and detail synthesis. We demonstrate that our method gives the compelling generation performance qualitatively and quantitatively with extensive experiments on benchmarks.

[63]  arXiv:2109.07054 [pdf, other]
Title: Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback
Comments: Accepted into ICML 2021 workshops Human-AI Collaboration in Sequential Decision-Making and Human in the Loop Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Human-Computer Interaction (cs.HC)

Fluid human-agent communication is essential for the future of human-in-the-loop reinforcement learning. An agent must respond appropriately to feedback from its human trainer even before they have significant experience working together. Therefore, it is important that learning agents respond well to various feedback schemes human trainers are likely to provide. This work analyzes the COnvergent Actor-Critic by Humans (COACH) algorithm under three different types of feedback-policy feedback, reward feedback, and advantage feedback. For these three feedback types, we find that COACH can behave sub-optimally. We propose a variant of COACH, episodic COACH (E-COACH), which we prove converges for all three types. We compare our COACH variant with two other reinforcement-learning algorithms: Q-learning and TAMER.

[64]  arXiv:2109.07055 [pdf, other]
Title: ISPY: Automatic Issue-Solution Pair Extraction from Community Live Chats
Comments: 13 pages, 5 figures
Subjects: Software Engineering (cs.SE)

Collaborative live chats are gaining popularity as a development communication tool. In community live chatting, developers are likely to post issues they encountered (e.g., setup issues and compile issues), and other developers respond with possible solutions. Therefore, community live chats contain rich sets of information for reported issues and their corresponding solutions, which can be quite useful for knowledge sharing and future reuse if extracted and restored in time. However, it remains challenging to accurately mine such knowledge due to the noisy nature of interleaved dialogs in live chat data. In this paper, we first formulate the problem of issue-solution pair extraction from developer live chat data, and propose an automated approach, named ISPY, based on natural language processing and deep learning techniques with customized enhancements, to address the problem. Specifically, ISPY automates three tasks: 1) Disentangle live chat logs, employing a feedforward neural network to disentangle a conversation history into separate dialogs automatically; 2) Detect dialogs discussing issues, using a novel convolutional neural network (CNN), which consists of a BERT-based utterance embedding layer, a context-aware dialog embedding layer, and an output layer; 3) Extract appropriate utterances and combine them as corresponding solutions, based on the same CNN structure but with different feeding inputs. To evaluate ISPY, we compare it with six baselines, utilizing a dataset with 750 dialogs including 171 issue-solution pairs and evaluate ISPY from eight open source communities. The results show that, for issue-detection, our approach achieves the F1 of 76%, and outperforms all baselines by 30%. Our approach achieves the F1 of 63% for solution-extraction and outperforms the baselines by 20%.

[65]  arXiv:2109.07060 [pdf, other]
Title: Analyzing Multiagent Interactions in Traffic Scenes via Topological Braids
Subjects: Robotics (cs.RO)

We focus on the problem of analyzing multiagent interactions in traffic domains. Understanding the space of behavior of real-world traffic may offer significant advantages for algorithmic design, data-driven methodologies, and benchmarking. However, the high dimensionality of the space and the stochasticity of human behavior may hinder the identification of important interaction patterns. Our key insight is that traffic environments feature significant geometric and temporal structure, leading to highly organized collective behaviors, often drawn from a small set of dominant modes. In this work, we propose a representation based on the formalism of topological braids that can summarize arbitrarily complex multiagent behavior into a compact object of dual geometric and symbolic nature, capturing critical events of interaction. This representation allows us to formally enumerate the space of outcomes in a traffic scene and characterize their complexity. We illustrate the value of the proposed representation in summarizing critical aspects of real-world traffic behavior through a case study on recent driving datasets. We show that despite the density of real-world traffic, observed behavior tends to follow highly organized patterns of low interaction. Our framework may be a valuable tool for evaluating the richness of driving datasets, but also for synthetically designing balanced training datasets or benchmarks.

[66]  arXiv:2109.07061 [pdf, ps, other]
Title: Scalable Cell-Free Massive MIMO Systems with Finite Resolution ADCs/DACs over Spatially Correlated Rician Fading Channels
Comments: 14 pages, 9 figures, regular paper
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

In this paper, an analytical framework for evaluating the performance of scalable cell-free massive MIMO (SCF-mMIMO) systems in which all user equipments (UEs) and access points (APs) employ finite resolution digital-to-analog converters (DACs) and analog-to-digital converters (ADCs) and operates under correlated Rician fading, is presented. By using maximal-ratio combining (MRC) detection, generic expressions for the uplink (UL) spectral efficiency (SE) for both distributed and centralized schemes are derived. In order to further reduce the computational complexity (CC) of the original local partial MMSE (LP-MMSE) and partial MMSE (P-MMSE) detectors, two novel scalable low complexity MMSE detectors are proposed for distributed and centralized schemes respectively, which achieves very similar SE performance. Furthermore, for the distributed scheme a novel partial large-scale fading decoding (P-LSFD) weighting vector is introduced and its analytical SE performance is very similar to the performance of an equivalent unscalable LSFD vector. Finally, a scalable algorithm jointly consisting of AP cluster formation, pilot assignment, and power control is proposed, which outperforms the conventional random pilot assignment and user-group based pilot assignment policies and, contrary to an equal power transmit strategy, it guarantees quality of service (QoS) fairness for all accessing UEs.

[67]  arXiv:2109.07067 [pdf, other]
Title: Improving Text Auto-Completion with Next Phrase Prediction
Comments: 4 pages, 2 figures, 4 tables, Accepted in EMNLP 2021-Findings
Subjects: Computation and Language (cs.CL)

Language models such as GPT-2 have performed well on constructing syntactically sound sentences for text auto-completion task. However, such models often require considerable training effort to adapt to specific writing domains (e.g., medical). In this paper, we propose an intermediate training strategy to enhance pre-trained language models' performance in the text auto-completion task and fastly adapt them to specific domains. Our strategy includes a novel self-supervised training objective called Next Phrase Prediction (NPP), which encourages a language model to complete the partial query with enriched phrases and eventually improve the model's text auto-completion performance. Preliminary experiments have shown that our approach is able to outperform the baselines in auto-completion for email and academic writing domains.

[68]  arXiv:2109.07069 [pdf, other]
Title: F-CAM: Full Resolution CAM via Guided Parametric Upscaling
Comments: 23pages, under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Class Activation Mapping (CAM) methods have recently gained much attention for weakly-supervised object localization (WSOL) tasks, allowing for CNN visualization and interpretation without training on fully annotated image datasets. CAM methods are typically integrated within off-the-shelf CNN backbones, such as ResNet50. Due to convolution and downsampling/pooling operations, these backbones yield low resolution CAMs with a down-scaling factor of up to 32, making accurate localization more difficult. Interpolation is required to restore a full size CAMs, but without considering the statistical properties of the objects, leading to activations with inconsistent boundaries and inaccurate localizations. As an alternative, we introduce a generic method for parametric upscaling of CAMs that allows constructing accurate full resolution CAMs (F-CAMs). In particular, we propose a trainable decoding architecture that can be connected to any CNN classifier to produce more accurate CAMs. Given an original (low resolution) CAM, foreground and background pixels are randomly sampled for fine-tuning the decoder. Additional priors such as image statistics, and size constraints are also considered to expand and refine object boundaries. Extensive experiments using three CNN backbones and six WSOL baselines on the CUB-200-2011 and OpenImages datasets, indicate that our F-CAM method yields a significant improvement in CAM localization accuracy. F-CAM performance is competitive with state-of-art WSOL methods, yet it requires fewer computational resources during inference.

[69]  arXiv:2109.07073 [pdf, other]
Title: Globally Consistent 3D LiDAR Mapping with GPU-accelerated GICP Matching Cost Factors
Comments: IEEE Robotics and Automation Letters, Video: this https URL
Subjects: Robotics (cs.RO)

This paper presents a real-time 3D LiDAR mapping framework based on global matching cost minimization. The proposed method constructs a factor graph that directly minimizes matching costs between frames over the entire map, unlike pose graph-based approaches that minimize errors in the pose space. For real-time global matching cost minimization, we use a voxel data association-based GICP matching cost factor that is able to fully leverage GPU parallel processing. The combination of the matching cost factor and GPU computation enables constraint of the relative pose between frames with a small overlap and creation of a densely connected factor graph. The mapping process is managed based on a voxel-based overlap metric that can quickly be evaluated on a GPU. We incorporate the proposed method with an external loop detection method in order to help the voxel-based matching cost factors to avoid convergence in a local solution. The experimental result on the KITTI dataset shows that the proposed approach improves the estimation accuracy of long trajectories.

[70]  arXiv:2109.07074 [pdf, other]
Title: Anti-Tamper Protection for Internet of Things System Using Hyperledger Fabric Blockchain Technology
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)

Automated and industrial Internet of Things (IoT) devices are increasing daily. As the number of IoT devices grows, the volume of data generated by them will also grow. Managing these rapidly expanding IoT devices and enormous data efficiently to be available to all authorized users without compromising its integrity will become essential in the near future. On the other side, many information security incidents have been recorded, increasing the requirement for countermeasures. While safeguards against hostile third parties have been commonplace until now, operators and parties have seen an increase in demand for data falsification detection and blocking. Blockchain technology is well-known for its privacy, immutability, and decentralized nature. Single-board computers are becoming more powerful while also becoming more affordable as IoT platforms. These single-board computers are gaining traction in the automation industry. This study focuses on a paradigm of IoT-Blockchain integration where the blockchain node runs autonomously on the IoT platform itself. It enables the system to conduct machine-to-machine transactions without the intervention of a person and to exert direct access control over IoT devices. This paper assumed that the readers are familiar with Hyperledger Fabric basic operations and focus on the practical approach of integration. A basic introduction is provided for the newbie on the blockchain.

[71]  arXiv:2109.07076 [pdf, other]
Title: Real-Time Multi-Contact Model Predictive Control via ADMM
Subjects: Robotics (cs.RO)

We propose a general hybrid model predictive control algorithm, consensus complementarity control (C3), for systems that make and break contact with their environment. Many state-of-the-art controllers for tasks which require initiating contact with the environment, such as locomotion and manipulation, require a priori mode schedules or are so computationally complex that they cannot run at real-time rates. We present a method, based on the alternating direction method of multipliers (ADMM), capable of highspeed reasoning over potential contact events. Via a consensus formulation, our approach enables parallelization of the contact scheduling problem. We validate our results on three numerical examples, including two frictional contact problems, and physical experimentation on an underactuated multi-contact system.

[72]  arXiv:2109.07078 [pdf, other]
Title: DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

For autonomous vehicles to viably replace human drivers they must contend with inclement weather. Falling rain and snow introduce noise in LiDAR returns resulting in both false positive and false negative object detections. In this article we introduce the Winter Adverse Driving dataSet (WADS) collected in the snow belt region of Michigan's Upper Peninsula. WADS is the first multi-modal dataset featuring dense point-wise labeled sequential LiDAR scans collected in severe winter weather; weather that would cause an experienced driver to alter their driving behavior. We have labelled and will make available over 7 GB or 3.6 billion labelled LiDAR points out of over 26 TB of total LiDAR and camera data collected. We also present the Dynamic Statistical Outlier Removal (DSOR) filter, a statistical PCL-based filter capable or removing snow with a higher recall than the state of the art snow de-noising filter while being 28\% faster. Further, the DSOR filter is shown to have a lower time complexity compared to the state of the art resulting in an improved scalability.
Our labeled dataset and DSOR filter will be made available at https://bitbucket.org/autonomymtu/dsor_filter

[73]  arXiv:2109.07079 [pdf]
Title: Image-Based Multi-UAV Tracking System in a Cluttered Environment
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

A tracking controller for unmanned aerial vehicles (UAVs) is developed to track moving targets undergoing unknown translational and rotational motions. The main challenges are to control both the relative positions and angles between the target and the UAVs to within desired values, and to guarantee that the generated control inputs to the UAVs are feasible (i.e., within their motion capabilities). Moreover, the UAVs are controlled to ensure that the target always remains within the fields of view of their onboard cameras. To the best of our knowledge, this is the first work to apply multiple UAVs to cooperatively track a dynamic target while ensuring that the UAVs remain connected and that both occlusion and collisions are avoided. To achieve these control objectives, a designed controller solved based on the aforementioned tracking controller using quadratic programming can generate minimally invasive control actions to achieve occlusion avoidance and collision avoidance. Furthermore, control barrier functions (CBFs) with a distributed design are developed in order to reduce the amount of inter-UAV communication. Simulations were performed to assess the efficacy and performance of the developed CBF-based controller for the multi-UAV system in tracking a target.

[74]  arXiv:2109.07080 [pdf, other]
Title: Transformer-based Lexically Constrained Headline Generation
Comments: EMNLP 2021
Subjects: Computation and Language (cs.CL)

This paper explores a variant of automatic headline generation methods, where a generated headline is required to include a given phrase such as a company or a product name. Previous methods using Transformer-based models generate a headline including a given phrase by providing the encoder with additional information corresponding to the given phrase. However, these methods cannot always include the phrase in the generated headline. Inspired by previous RNN-based methods generating token sequences in backward and forward directions from the given phrase, we propose a simple Transformer-based method that guarantees to include the given phrase in the high-quality generated headline. We also consider a new headline generation strategy that takes advantage of the controllable generation order of Transformer. Our experiments with the Japanese News Corpus demonstrate that our methods, which are guaranteed to include the phrase in the generated headline, achieve ROUGE scores comparable to previous Transformer-based methods. We also show that our generation strategy performs better than previous strategies.

[75]  arXiv:2109.07082 [pdf, other]
Title: Efficient and Probabilistic Adaptive VoxelnMapping for Accurate Online3D SLAM
Comments: 7 pages, 9 figures, submitted to ICRA
Subjects: Robotics (cs.RO)

This paper proposes an efficient and probabilistic adaptive voxel mapping method for 3D SLAM. An accurate uncertainty model of point and plane is proposed for probabilistic plane representation. We analyze the need for coarse-to-fine voxel mapping and then use a novel voxel map organized by a Hash table and octrees to build and update the map efficiently. We apply the voxel map to the iterated Kalman filter and construct the maximum posterior probability problem for pose estimation. The experiments on the open KITTI dataset show the high accuracy and efficiency of our method in contrast with other state-of-the-art. Outdoor experiments on unstructured environments with non-repetitive scanning LiDAR further verify the adaptability of our mapping method to different environments and LiDAR scanning patterns.

[76]  arXiv:2109.07084 [pdf, other]
Title: Fast Extraction of Word Embedding from Q-contexts
Comments: Accepted by CIKM 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The notion of word embedding plays a fundamental role in natural language processing (NLP). However, pre-training word embedding for very large-scale vocabulary is computationally challenging for most existing methods. In this work, we show that with merely a small fraction of contexts (Q-contexts)which are typical in the whole corpus (and their mutual information with words), one can construct high-quality word embedding with negligible errors. Mutual information between contexts and words can be encoded canonically as a sampling state, thus, Q-contexts can be fast constructed. Furthermore, we present an efficient and effective WEQ method, which is capable of extracting word embedding directly from these typical contexts. In practical scenarios, our algorithm runs 11$\sim$13 times faster than well-established methods. By comparing with well-known methods such as matrix factorization, word2vec, GloVeand fasttext, we demonstrate that our method achieves comparable performance on a variety of downstream NLP tasks, and in the meanwhile maintains run-time and resource advantages over all these baselines.

[77]  arXiv:2109.07087 [pdf, other]
Title: Soft-Jig: A Flexible Sensing Jig for Simultaneously Fixing and Estimating Orientation of Assembly Parts
Comments: 6 pages, 14 figures
Subjects: Robotics (cs.RO)

For assembly tasks, it is essential to firmly fix target parts and to accurately estimate their poses. Several rigid jigs for individual parts are frequently used in assembly factories to achieve precise and time-efficient product assembly. However, providing customized jigs is time-consuming. In this study, to address the lack of versatility in the shapes the jigs can be used for, we developed a flexible jig with a soft membrane including transparent beads and oil with a tuned refractive index. The bead-based jamming transition was accomplished by discharging only oil enabling a part to be firmly fixed. Because the two cameras under the jig are able to capture membrane shape changes, we proposed a sensing method to estimate the orientation of the part based on the behaviors of markers created on the jig's inner surface. Through estimation experiments, the proposed system could estimate the orientation of a cylindrical object with a diameter larger than 50 mm and an RMSE of less than 3 degrees.

[78]  arXiv:2109.07095 [pdf, other]
Title: Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering
Comments: Findings of EMNLP 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Paraphrase generation is an important task in natural language processing. Previous works focus on sentence-level paraphrase generation, while ignoring document-level paraphrase generation, which is a more challenging and valuable task. In this paper, we explore the task of document-level paraphrase generation for the first time and focus on the inter-sentence diversity by considering sentence rewriting and reordering. We propose CoRPG (Coherence Relationship guided Paraphrase Generation), which leverages graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence, which can be used for re-arranging the multiple (possibly modified) input sentences. We create a pseudo document-level paraphrase dataset for training CoRPG. Automatic evaluation results show CoRPG outperforms several strong baseline models on the BERTScore and diversity scores. Human evaluation also shows our model can generate document paraphrase with more diversity and semantic preservation.

[79]  arXiv:2109.07100 [pdf, other]
Title: Hybrid Local-Global Transformer for Image Dehazing
Comments: 19 pages,17 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, the Vision Transformer (ViT) has shown impressive performance on high-level and low-level vision tasks. In this paper, we propose a new ViT architecture, named Hybrid Local-Global Vision Transformer (HyLoG-ViT), for single image dehazing. The HyLoG-ViT block consists of two paths, the local ViT path and the global ViT path, which are used to capture local and global dependencies. The hybrid features are fused via convolution layers. As a result, the HyLoG-ViT reduces the computational complexity and introduces locality in the networks. Then, the HyLoG-ViT blocks are incorporated within our dehazing networks, which jointly learn the intrinsic image decomposition and image dehazing. Specifically, the network consists of one shared encoder and three decoders for reflectance prediction, shading prediction, and haze-free image generation. The tasks of reflectance and shading prediction can produce meaningful intermediate features that can serve as complementary features for haze-free image generation. To effectively aggregate the complementary features, we propose a complementary features selection module (CFSM) to select the useful ones for image dehazing. Extensive experiments on homogeneous, non-homogeneous, and nighttime dehazing tasks reveal that our proposed Transformer-based dehazing network can achieve comparable or even better performance than CNNs-based dehazing models.

[80]  arXiv:2109.07101 [pdf, other]
Title: Delay-aware Robust Control for Safe Autonomous Driving
Comments: Under review at ICRA 2022
Subjects: Robotics (cs.RO)

With the advancement of affordable self-driving vehicles using complicated nonlinear optimization but limited computation resources, computation time becomes a matter of concern. Other factors such as actuator dynamics and actuator command processing cost also unavoidably cause delays. In high-speed scenarios, these delays are critical to the safety of a vehicle. Recent works consider these delays individually, but none unifies them all in the context of autonomous driving. Moreover, recent works inappropriately consider computation time as a constant or a large upper bound, which makes the control either less responsive or over-conservative. To deal with all these delays, we present a unified framework by 1) modeling actuation dynamics, 2) using robust tube model predictive control, 3) using a novel adaptive Kalman filter without assuminga known process model and noise covariance, which makes the controller safe while minimizing conservativeness. On onehand, our approach can serve as a standalone controller; on theother hand, our approach provides a safety guard for a high-level controller, which assumes no delay. This can be used for compensating the sim-to-real gap when deploying a black-box learning-enabled controller trained in a simplistic environment without considering delays for practical vehicle systems.

[81]  arXiv:2109.07102 [pdf, other]
Title: Can Edge Probing Tasks Reveal Linguistic Knowledge in QA Models?
Subjects: Computation and Language (cs.CL)

There have been many efforts to try to understand what grammatical knowledge (e.g., ability to understand the part of speech of a token) is encoded in large pre-trained language models (LM). This is done through `Edge Probing' (EP) tests: simple ML models that predict the grammatical properties of a span (whether it has a particular part of speech) using \textit{only} the LM's token representations. However, most NLP applications use \finetuned\ LMs. Here, we ask: if a LM is \finetuned, does the encoding of linguistic information in it change, as measured by EP tests? Conducting experiments on multiple question-answering (QA) datasets, we answer that question negatively: the EP test results do not change significantly when the fine-tuned QA model performs well or in adversarial situations where the model is forced to learn wrong correlations. However, a critical analysis of the EP task datasets reveals that EP models may rely on spurious correlations to make predictions. This indicates even if \finetuning\ changes the encoding of such knowledge, the EP tests might fail to measure it.

[82]  arXiv:2109.07103 [pdf, other]
Title: Automatic Symmetry Discovery with Lie Algebra Convolutional Network
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Group Theory (math.GR)

Existing equivariant neural networks for continuous groups require discretization or group representations. All these approaches require detailed knowledge of the group parametrization and cannot learn entirely new symmetries. We propose to work with the Lie algebra (infinitesimal generators) instead of the Lie group.Our model, the Lie algebra convolutional network (L-conv) can learn potential symmetries and does not require discretization of the group. We show that L-conv can serve as a building block to construct any group equivariant architecture. We discuss how CNNs and Graph Convolutional Networks are related to and can be expressed as L-conv with appropriate groups. We also derive the MSE loss for a single L-conv layer and find a deep relation with Lagrangians used in physics, with some of the physics aiding in defining generalization and symmetries in the loss landscape. Conversely, L-conv could be used to propose more general equivariant ans\"atze for scientific machine learning.

[83]  arXiv:2109.07105 [pdf, other]
Title: Local NMPC on Global Optimised Path for Autonomous Racing
Comments: ICRA workshop on Opportunities and Challenges with Autonomous Racing, 31 May, 2021(accepted)
Subjects: Robotics (cs.RO)

The paper presents a strategy for the control of anautonomous racing car on a pre-mapped track. Using a dynamic model of the vehicle, the optimal racing line is computed, taking track boundaries into account. With the optimal racing line as areference, a local nonlinear model predictive controller (NMPC) is proposed, which takes into account multiple local objectives like making more progress along the race line, avoiding collision with opponent vehicles, and use of drafting to achieve more progress.

[84]  arXiv:2109.07106 [pdf]
Title: WIP: Medical Incident Prediction Through Analysis of Electronic Medical Records Using Machine Lerning: Fall Prediction
Comments: ICIEV/IVPR2021. Work-In-Progress paper. 6 pages. 5 tables. Emerging Researcher Award
Journal-ref: Proceedings of ICIEV/IVPR2021
Subjects: Machine Learning (cs.LG)

This paper reports our preliminary work on medical incident prediction in general, and fall risk prediction in specific, using machine learning. Data for the machine learning are generated only from the particular subset of the electronic medical records (EMR) at Osaka Medical and Pharmaceutical University Hospital. As a result of conducting three experiments such as (1) machine learning algorithm comparison, (2) handling imbalance, and (3) investigation of explanatory variable contribution to the fall incident prediction, we find the investigation of explanatory variables the most effective.

[85]  arXiv:2109.07107 [pdf, other]
Title: Anchor DETR: Query Design for Transformer-Based Detector
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a novel query design for the transformer-based detectors. In previous transformer-based detectors, the object queries are a set of learned embeddings. However, each learned embedding does not have an explicit physical meaning and we can not explain where it will focus on. It is difficult to optimize as the prediction slot of each object query does not have a specific mode. In other words, each object query will not focus on a specific region. To solved these problems, in our query design, object queries are based on anchor points, which are widely used in CNN-based detectors. So each object query focus on the objects near the anchor point. Moreover, our query design can predict multiple objects at one position to solve the difficulty: "one region, multiple objects". In addition, we design an attention variant, which can reduce the memory cost while achieving similar or better performance than the standard attention in DETR. Thanks to the query design and the attention variant, the proposed detector that we called Anchor DETR, can achieve better performance and run faster than the DETR with 10$\times$ fewer training epochs. For example, it achieves 44.2 AP with 16 FPS on the MSCOCO dataset when using the ResNet50-DC5 feature for training 50 epochs. Extensive experiments on the MSCOCO benchmark prove the effectiveness of the proposed methods. Code is available at https://github.com/megvii-model/AnchorDETR.

[86]  arXiv:2109.07111 [pdf, other]
Title: Elastic Tracker: A Spatio-temporal Trajectory Planner Flexible Aerial Tracking
Comments: video: TODO summited to: ICRA2022
Subjects: Robotics (cs.RO)

This paper proposes Elastic Tracker, a flexible trajectory planning framework that can deal with challenging tracking tasks with guaranteed safety and visibility. Firstly, an object detection and intension-free motion prediction method is designed. Then an occlusion-aware path finding method is proposed to provide a proper topology. A smart safe flight corridor generation strategy is designed with the guiding path. An analytical occlusion cost is evaluated. Finally, an effective trajectory optimization approach enables to generate a spatio-temporal optimal trajectory within the resultant flight corridor. Particular formulations are designed to guarantee both safety and visibility, with all the above requirements optimized jointly. The experimental results show that our method works more robustly but with less computation than the existing methods, even in some challenging tracking tasks.

[87]  arXiv:2109.07112 [pdf, other]
Title: Learning Friction Model for Magnet-actuated Tethered Capsule Robot
Comments: icra2022. arXiv admin note: text overlap with arXiv:2108.07151
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

The diagnostic application of magnetic control capsules in medical treatment is progressively increasing. At present, the accurate dynamic control of the capsule robot is becoming more and more important. There is a significant aim to establish a friction model of the tethered capsule robot, and the friction model can be applied to simulate the drag force of the tethered and all friction between the capsule and the environment. We confirmed the fifth-order linear fitting relationship between the friction factor of the friction model and the speed of the external permanent magnet. With the learned friction model, effective speed ranges of three mediums are obtained on PVC, paper, and polyester-cloth. In the article, a tethered capsule robot system driven by a robot manipulator is built, and a plane motion method to learn the friction model is proposed.

[88]  arXiv:2109.07114 [pdf, ps, other]
Title: Backward diffusion-wave problem: stability, regularization and approximation
Comments: 29 pages
Subjects: Numerical Analysis (math.NA)

We aim at the development and analysis of the numerical schemes for approximately solving the backward diffusion-wave problem, which involves a fractional derivative in time with order $\alpha\in(1,2)$. From terminal observations at two time levels, i.e., $u(T_1)$ and $u(T_2)$, we simultaneously recover two initial data $u(0)$ and $u_t(0)$ and hence the solution $u(t)$ for all $t > 0$. First of all, existence, uniqueness and Lipschitz stability of the backward diffusion-wave problem were established under some conditions about $T_1$ and $T_2$. Moreover, for noisy data, we propose a quasi-boundary value scheme to regularize the "mildly" ill-posed problem, and show the convergence of the regularized solution. Next, to numerically solve the regularized problem, a fully discrete scheme is proposed by applying finite element method in space and convolution quadrature in time. We establish error bounds of the discrete solution in both cases of smooth and nonsmooth data. The error estimate is very useful in practice since it indicates the way to choose discretization parameters and regularization parameter, according to the noise level. The theoretical results are supported by numerical experiments.

[89]  arXiv:2109.07117 [pdf, other]
Title: Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming Data
Authors: Antoine Godichon-Baggioni (LPSM (UMR\_8001)), Nicklas Werge (LPSM (UMR\_8001)), Olivier Wintenberger (LPSM (UMR\_8001))
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Motivated by the high-frequency data streams continuously generated, real-time learning is becoming increasingly important. These data streams should be processed sequentially with the property that the stream may change over time. In this streaming setting, we propose techniques for minimizing a convex objective through unbiased estimates of its gradients, commonly referred to as stochastic approximation problems. Our methods rely on stochastic approximation algorithms due to their computationally advantage as they only use the previous iterate as a parameter estimate. The reasoning includes iterate averaging that guarantees optimal statistical efficiency under classical conditions. Our non-asymptotic analysis shows accelerated convergence by selecting the learning rate according to the expected data streams. We show that the average estimate converges optimally and robustly to any data stream rate. In addition, noise reduction can be achieved by processing the data in a specific pattern, which is advantageous for large-scale machine learning. These theoretical results are illustrated for various data streams, showing the effectiveness of the proposed algorithms.

[90]  arXiv:2109.07118 [pdf, other]
Title: Low-Resource Named Entity Recognition Based on Multi-hop Dependency Trigger
Authors: Jiangxu Wu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper presents a simple and effective approach in low-resource named entity recognition (NER) based on multi-hop dependency trigger. Dependency trigger refer to salient nodes relative to a entity in the dependency graph of a context sentence. Our main observation is that there often exists trigger which play an important role to recognize the location and type of entity in sentence. Previous research has used manual labelling of trigger. Our main contribution is to propose use a syntactic parser to automatically annotate trigger. Experiments on two English datasets (CONLL 2003 and BC5CDR) show that the proposed method is comparable to the previous trigger-based NER model.

[91]  arXiv:2109.07120 [pdf, other]
Title: Infusing model predictive control into meta-reinforcement learning for mobile robots in dynamic environments
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

The successful operation of mobile robots requires them to rapidly adapt to environmental changes. Toward developing an adaptive decision-making tool for mobile robots, we propose combining meta-reinforcement learning (meta-RL) with model predictive control (MPC). The key idea of our method is to switch between a meta-learned policy and an MPC controller in an event-triggered fashion. Our method uses an off-policy meta-RL algorithm as a baseline to train a policy using transition samples generated by MPC. The MPC module of our algorithm is carefully designed to infer the movements of obstacles via Gaussian process regression (GPR) and to avoid collisions via conditional value-at-risk (CVaR) constraints. Due to its design, our method benefits from the two complementary tools. First, high-performance action samples generated by the MPC controller enhance the learning performance and stability of the meta-RL algorithm. Second, through the use of the meta-learned policy, the MPC controller is infrequently activated, thereby significantly reducing computation time. The results of our simulations on a restaurant service robot show that our algorithm outperforms both of the baseline methods.

[92]  arXiv:2109.07121 [pdf, other]
Title: Enhancing Data-Driven Reachability Analysis using Temporal Logic Side Information
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper presents algorithms for performing data-driven reachability analysis under temporal logic side information. In certain scenarios, the data-driven reachable sets of a robot can be prohibitively conservative due to the inherent noise in the robot's historical measurement data. In the same scenarios, we often have side information about the robot's expected motion (e.g., limits on how much a robot can move in a one-time step) that could be useful for further specifying the reachability analysis. In this work, we show that if we can model this side information using a signal temporal logic (STL) fragment, we can constrain the data-driven reachability analysis and safely limit the conservatism of the computed reachable sets. Moreover, we provide formal guarantees that, even after incorporating side information, the computed reachable sets still properly over-approximate the robot's future states. Lastly, we empirically validate the practicality of the over-approximation by computing constrained, data-driven reachable sets for the Small-Vehicles-for-Autonomy (SVEA) hardware platform in two driving scenarios.

[93]  arXiv:2109.07125 [pdf, ps, other]
Title: Diagnosability of labeled max-plus automata
Comments: 13 pages, 6 figures
Subjects: Formal Languages and Automata Theory (cs.FL)

In this paper, \emph{diagnosability} is characterized for a labeled max-plus automaton $\mathcal{A}^{\mathcal{D}}$ over a dioid $\mathcal{D}$ as a real-time system. In order to represent time elapsing, a special class of dioids called \emph{progressive} are considered, in which there is a total canonical order, there is at least one element greater than $\textbf{1}$, the product of sufficiently many elements greater than $\textbf{1}$ is arbitrarily large, and the cancellative law is satisfied. Then a notion of diagnosability is formulated for $\mathcal{A}^{\mathcal{D}}$ over a progressive dioid $\mathcal{D}$. By developing a notion of \emph{concurrent composition}, a sufficient and necessary condition is given for diagnosability of automaton $\mathcal{A}^{\mathcal{D}}$. It is also proven that the problem of verifying diagnosability of $\mathcal{A}^{\underline{\mathbb{Q}}}$ is coNP-complete, where coNP-hardness even holds for deterministic, deadlock-free, and divergence-free $\mathcal{A}^{\underline{\mathbb{N}}}$, where $\underline{\mathbb{Q}}$ and $\underline{\mathbb{N}}$ are the max-plus dioids having elements in $\mathbb{Q}\cup\{-\infty\}$ and $\mathbb{N}\cup\{-\infty\}$, respectively.

[94]  arXiv:2109.07127 [pdf, ps, other]
Title: A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance
Subjects: Databases (cs.DB)

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done manually with data wrangling tools, or it can be completed automatically with a computer program. Data cleaning entails a slew of procedures that, once done, make the data ready for analysis. Given its significance in numerous fields, there is a growing interest in the development of efficient and effective data cleaning frameworks. In this survey, some of the most recent advancements of data cleaning approaches are examined for their effectiveness and the future research directions are suggested to close the gap in each of the methods.

[95]  arXiv:2109.07128 [pdf, ps, other]
Title: The interplay of different metrics for the construction of constant dimension codes
Authors: Sascha Kurz
Comments: 17 pages
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

A basic problem for constant dimension codes is to determine the maximum possible size $A_q(n,d;k)$ of a set of $k$-dimensional subspaces in $\mathbb{F}_q^n$, called codewords, such that the subspace distance satisfies $d_S(U,W):=2k-2\dim(U\cap W)\ge d$ for all pairs of different codewords $U$, $W$. Constant dimension codes have applications in e.g.\ random linear network coding, cryptography, and distributed storage. Bounds for $A_q(n,d;k)$ are the topic of many recent research papers. Providing a general framework we survey many of the latest constructions and show up the potential for further improvements. As examples we give improved constructions for the cases $A_q(10,4;5)$, $A_q(11,4;4)$, $A_q(12,6;6)$, and $A_q(15,4;4)$. We also derive general upper bounds for subcodes arising in those constructions.

[96]  arXiv:2109.07129 [pdf, other]
Title: What Does The User Want? Information Gain for Hierarchical Dialogue Policy Optimisation
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

The dialogue management component of a task-oriented dialogue system is typically optimised via reinforcement learning (RL). Optimisation via RL is highly susceptible to sample inefficiency and instability. The hierarchical approach called Feudal Dialogue Management takes a step towards more efficient learning by decomposing the action space. However, it still suffers from instability due to the reward only being provided at the end of the dialogue. We propose the usage of an intrinsic reward based on information gain to address this issue. Our proposed reward favours actions that resolve uncertainty or query the user whenever necessary. It enables the policy to learn how to retrieve the users' needs efficiently, which is an integral aspect in every task-oriented conversation. Our algorithm, which we call FeudalGain, achieves state-of-the-art results in most environments of the PyDial framework, outperforming much more complex approaches. We confirm the sample efficiency and stability of our algorithm through experiments in simulation and a human trial.

[97]  arXiv:2109.07131 [pdf, other]
Title: HM-DDP: A Hybrid Multiple-shooting Differential Dynamic Programming Method for Constrained Trajectory Optimization
Comments: 7 pages, 2 figures
Subjects: Robotics (cs.RO)

Trajectory optimization has been used extensively in robotic systems. In particular, Differential Dynamic Programming (DDP) has performed well as an off-line planner or an online nonlinear model predictive control solver, with a lower computational cost compared with other general-purpose nonlinear programming solvers. However, standard DDP cannot handle any constraints or perform reasonable initialization of a state trajectory. In this paper, we propose a hybrid constrained DDP variant with a multiple-shooting framework. The main technical contributions are twofold: 1) In addition to inheriting the simplicity of the initialization in multiple shooting, a two-stage framework is developed to deal with state and control inequality constraints robustly without loss of the linear feedback term of DDP. Such a hybrid strategy offers a fast convergence of constraint satisfaction. 2) An improved globalization strategy is proposed to exploit the coupled effects between line-searching and regularization, which is able to enhance the numerical robustness of DDP-like approaches. Our approach is tested on three constrained trajectory optimization problems with nonlinear inequality constraints and outperforms the commonly-used collocation and shooting methods in terms of runtime and constraint satisfaction.

[98]  arXiv:2109.07132 [pdf, ps, other]
Title: Parallel Constraint-Driven Inductive Logic Programming
Comments: Paper under review
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Multi-core machines are ubiquitous. However, most inductive logic programming (ILP) approaches use only a single core, which severely limits their scalability. To address this limitation, we introduce parallel techniques based on constraint-driven ILP where the goal is to accumulate constraints to restrict the hypothesis space. Our experiments on two domains (program synthesis and inductive general game playing) show that (i) parallelisation can substantially reduce learning times, and (ii) worker communication (i.e. sharing constraints) is important for good performance.

[99]  arXiv:2109.07133 [pdf, other]
Title: Combining Context Awareness and Planning to Learn Behavior Trees from Demonstration
Comments: Submitted to ICRA 2022
Subjects: Robotics (cs.RO)

Fast changing tasks in unpredictable, collaborative environments are typical for medium-small companies, where robotised applications are increasing. Thus, robot programs should be generated in short time with small effort, and the robot able to react dynamically to the environment. To address this we propose a method that combines context awareness and planning to learn Behavior Trees (BTs), a reactive policy representation that is becoming more popular in robotics and has been used successfully in many collaborative scenarios. Context awareness allows to infer from the demonstration the frames in which actions are executed and to capture relevant aspects of the task, while a planner is used to automatically generate the BT from the sequence of actions from the demonstration. The learned BT is shown to solve non-trivial manipulation tasks where learning the context is fundamental to achieve the goal. Moreover, we collected non-expert demonstrations to study the performances of the algorithm in industrial scenarios.

[100]  arXiv:2109.07134 [pdf, other]
Title: ROW-SLAM: Under-Canopy Cornfield Semantic SLAM
Comments: 7 pages, 6 figures
Subjects: Robotics (cs.RO)

We study a semantic SLAM problem faced by a robot tasked with autonomous weeding under the corn canopy. The goal is to detect corn stalks and localize them in a global coordinate frame. This is a challenging setup for existing algorithms because there is very little space between the camera and the plants, and the camera motion is primarily restricted to be along the row. To overcome these challenges, we present a multi-camera system where a side camera (facing the plants) is used for detection whereas front and back cameras are used for motion estimation. Next, we show how semantic features in the environment (corn stalks, ground, and crop planes) can be used to develop a robust semantic SLAM solution and present results from field trials performed throughout the growing season across various cornfields.

[101]  arXiv:2109.07135 [pdf, other]
Title: Co-Embedding: Discovering Communities on Bipartite Graphs through Projection
Comments: Submitted and accepted to FICC 2022 (Future of Information and Communication Conference)
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Many datasets take the form of a bipartite graph where two types of nodes are connected by relationships, like the movies watched by a user or the tags associated with a file. The partitioning of the bipartite graph could be used to fasten recommender systems, or reduce the information retrieval system's index size, by identifying groups of items with similar properties. This type of graph is often processed by algorithms using the Vector Space Model representation, where a binary vector represents an item with 0 and 1. The main problem with this representation is the dimension relatedness, like words' synonymity, which is not considered. This article proposes a co-clustering algorithm using items projection, allowing the measurement of features similarity. We evaluated our algorithm on a cluster retrieval task. Over various datasets, our algorithm produced well balanced clusters with coherent items in, leading to high retrieval scores on this task.

[102]  arXiv:2109.07137 [pdf, other]
Title: Optimal Cycling of a Heterogenous Battery Bank via Reinforcement Learning
Comments: Appeared on IEEE SmartGridComm 2021 conference
Subjects: Machine Learning (cs.LG)

We consider the problem of optimal charging/discharging of a bank of heterogenous battery units, driven by stochastic electricity generation and demand processes. The batteries in the battery bank may differ with respect to their capacities, ramp constraints, losses, as well as cycling costs. The goal is to minimize the degradation costs associated with battery cycling in the long run; this is posed formally as a Markov decision process. We propose a linear function approximation based Q-learning algorithm for learning the optimal solution, using a specially designed class of kernel functions that approximate the structure of the value functions associated with the MDP. The proposed algorithm is validated via an extensive case study.

[103]  arXiv:2109.07138 [pdf, other]
Title: Patch-based medical image segmentation using Quantum Tensor Networks
Comments: Possible journal extension of our preliminary conference work "Segmenting two-dimensional structures with strided tensor networks", Selvan et al. 2021, available at arXiv:2102.06900. 22 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Tensor networks are efficient factorisations of high dimensional tensors into a network of lower order tensors. They have been most commonly used to model entanglement in quantum many-body systems and more recently are witnessing increased applications in supervised machine learning. In this work, we formulate image segmentation in a supervised setting with tensor networks. The key idea is to first lift the pixels in image patches to exponentially high dimensional feature spaces and using a linear decision hyper-plane to classify the input pixels into foreground and background classes. The high dimensional linear model itself is approximated using the matrix product state (MPS) tensor network. The MPS is weight-shared between the non-overlapping image patches resulting in our strided tensor network model. The performance of the proposed model is evaluated on three 2D- and one 3D- biomedical imaging datasets. The performance of the proposed tensor network segmentation model is compared with relevant baseline methods. In the 2D experiments, the tensor network model yeilds competitive performance compared to the baseline methods while being more resource efficient.

[104]  arXiv:2109.07140 [pdf, ps, other]
Title: On the Universality of Deep COntextual Language Models
Subjects: Computation and Language (cs.CL)

Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning. Furthermore, multilingual versions of such models like XLM-R and mBERT have given promising results in zero-shot cross-lingual transfer, potentially enabling NLP applications in many under-served and under-resourced languages. Due to this initial success, pre-trained models are being used as `Universal Language Models' as the starting point across diverse tasks, domains, and languages. This work explores the notion of `Universality' by identifying seven dimensions across which a universal model should be able to scale, that is, perform equally well or reasonably well, to be useful across diverse settings. We outline the current theoretical and empirical results that support model performance across these dimensions, along with extensions that may help address some of their current limitations. Through this survey, we lay the foundation for understanding the capabilities and limitations of massive contextual language models and help discern research gaps and directions for future work to make these LMs inclusive and fair to diverse applications, users, and linguistic phenomena.

[105]  arXiv:2109.07141 [pdf, other]
Title: Beyond Glass-Box Features: Uncertainty Quantification Enhanced Quality Estimation for Neural Machine Translation
Comments: Accepted by Findings of EMNLP 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Quality Estimation (QE) plays an essential role in applications of Machine Translation (MT). Traditionally, a QE system accepts the original source text and translation from a black-box MT system as input. Recently, a few studies indicate that as a by-product of translation, QE benefits from the model and training data's information of the MT system where the translations come from, and it is called the "glass-box QE". In this paper, we extend the definition of "glass-box QE" generally to uncertainty quantification with both "black-box" and "glass-box" approaches and design several features deduced from them to blaze a new trial in improving QE's performance. We propose a framework to fuse the feature engineering of uncertainty quantification into a pre-trained cross-lingual language model to predict the translation quality. Experiment results show that our method achieves state-of-the-art performances on the datasets of WMT 2020 QE shared task.

[106]  arXiv:2109.07142 [pdf]
Title: Universal Adversarial Attack on Deep Learning Based Prognostics
Comments: 7 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Deep learning-based time series models are being extensively utilized in engineering and manufacturing industries for process control and optimization, asset monitoring, diagnostic and predictive maintenance. These models have shown great improvement in the prediction of the remaining useful life (RUL) of industrial equipment but suffer from inherent vulnerability to adversarial attacks. These attacks can be easily exploited and can lead to catastrophic failure of critical industrial equipment. In general, different adversarial perturbations are computed for each instance of the input data. This is, however, difficult for the attacker to achieve in real time due to higher computational requirement and lack of uninterrupted access to the input data. Hence, we present the concept of universal adversarial perturbation, a special imperceptible noise to fool regression based RUL prediction models. Attackers can easily utilize universal adversarial perturbations for real-time attack since continuous access to input data and repetitive computation of adversarial perturbations are not a prerequisite for the same. We evaluate the effect of universal adversarial attacks using NASA turbofan engine dataset. We show that addition of universal adversarial perturbation to any instance of the input data increases error in the output predicted by the model. To the best of our knowledge, we are the first to study the effect of the universal adversarial perturbation on time series regression models. We further demonstrate the effect of varying the strength of perturbations on RUL prediction models and found that model accuracy decreases with the increase in perturbation strength of the universal adversarial attack. We also showcase that universal adversarial perturbation can be transferred across different models.

[107]  arXiv:2109.07143 [pdf, other]
Title: Spline-PINN: Approaching PDEs without Data using Fast, Physics-Informed Hermite-Spline CNNs
Comments: Submitted to AAAI 2022
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Fluid Dynamics (physics.flu-dyn)

Partial Differential Equations (PDEs) are notoriously difficult to solve. In general, closed-form solutions are not available and numerical approximation schemes are computationally expensive. In this paper, we propose to approach the solution of PDEs based on a novel technique that combines the advantages of two recently emerging machine learning based approaches. First, physics-informed neural networks (PINNs) learn continuous solutions of PDEs and can be trained with little to no ground truth data. However, PINNs do not generalize well to unseen domains. Second, convolutional neural networks provide fast inference and generalize but either require large amounts of training data or a physics-constrained loss based on finite differences that can lead to inaccuracies and discretization artifacts. We leverage the advantages of both of these approaches by using Hermite spline kernels in order to continuously interpolate a grid-based state representation that can be handled by a CNN. This allows for training without any precomputed training data using a physics-informed loss function only and provides fast, continuous solutions that generalize to unseen domains. We demonstrate the potential of our method at the examples of the incompressible Navier-Stokes equation and the damped wave equation. Our models are able to learn several intriguing phenomena such as Karman vortex streets, the Magnus effect, Doppler effect, interference patterns and wave reflections. Our quantitative assessment and an interactive real-time demo show that we are narrowing the gap in accuracy of unsupervised ML based methods to industrial CFD solvers while being orders of magnitude faster.

[108]  arXiv:2109.07148 [pdf, other]
Title: Semantics of European poetry is shaped by conservative forces: The relationship between poetic meter and meaning in accentual-syllabic verse
Subjects: Computation and Language (cs.CL)

Recent advances in cultural analytics and large-scale computational studies of art, literature and film often show that long-term change in the features of artistic works happens gradually. These findings suggest that conservative forces that shape creative domains might be underestimated. To this end, we provide the first large-scale formal evidence of the persistent association between poetic meter and semantics in 18-19th European literatures, using Czech, German and Russian collections with additional data from English poetry and early modern Dutch songs. Our study traces this association through a series of clustering experiments using the abstracted semantic features of 150,000 poems. With the aid of topic modeling we infer semantic features for individual poems. Texts were also lexically simplified across collections to increase generalizability and decrease the sparseness of word frequency distributions. Topics alone enable recognition of the meters in each observed language, as may be seen from highly robust clustering of same-meter samples (median Adjusted Rand Index between 0.48 and 1). In addition, this study shows that the strength of the association between form and meaning tends to decrease over time. This may reflect a shift in aesthetic conventions between the 18th and 19th centuries as individual innovation was increasingly favored in literature. Despite this decline, it remains possible to recognize semantics of the meters from past or future, which suggests the continuity of semantic traditions while also revealing the historical variability of conditions across languages. This paper argues that distinct metrical forms, which are often copied in a language over centuries, also maintain long-term semantic inertia in poetry. Our findings, thus, highlight the role of the formal features of cultural items in influencing the pace and shape of cultural evolution.

[109]  arXiv:2109.07149 [pdf, other]
Title: Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)

Automatic emotion recognition (AER) based on enriched multimodal inputs, including text, speech, and visual clues, is crucial in the development of emotionally intelligent machines. Although complex modality relationships have been proven effective for AER, they are still largely underexplored because previous works predominantly relied on various fusion mechanisms with simply concatenated features to learn multimodal representations for emotion classification. This paper proposes a novel hierarchical fusion graph convolutional network (HFGCN) model that learns more informative multimodal representations by considering the modality dependencies during the feature fusion procedure. Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation. We verified the interpretable capabilities of the proposed method by projecting the emotional states to a 2D valence-arousal (VA) subspace. Extensive experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets, IEMOCAP and MELD.

[110]  arXiv:2109.07150 [pdf, other]
Title: Solving Occlusion in Terrain Mapping with Neural Networks
Comments: 11 pages
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Accurate and complete terrain maps enhance the awareness of autonomous robots and enable safe and optimal path planning. Rocks and topography often create occlusions and lead to missing elevation information in the Digital Elevation Map (DEM). Currently, mostly traditional inpainting techniques based on diffusion or patch-matching are used by autonomous mobile robots to fill-in incomplete DEMs. These methods cannot leverage the high-level terrain characteristics and the geometric constraints of line of sight we humans use intuitively to predict occluded areas. We propose to use neural networks to reconstruct the occluded areas in DEMs. We introduce a self-supervised learning approach capable of training on real-world data without a need for ground-truth information. We accomplish this by adding artificial occlusion to the incomplete elevation maps constructed on a real robot by performing ray casting. We first evaluate a supervised learning approach on synthetic data for which we have the full ground-truth available and subsequently move to several real-world datasets. These real-world datasets were recorded during autonomous exploration of both structured and unstructured terrain with a legged robot, and additionally in a planetary scenario on Lunar analogue terrain. We state a significant improvement compared to the Telea and Navier-Stokes baseline methods both on synthetic terrain and for the real-world datasets. Our neural network is able to run in real-time on both CPU and GPU with suitable sampling rates for autonomous ground robots.

[111]  arXiv:2109.07152 [pdf, other]
Title: Incorporating Residual and Normalization Layers into Analysis of Masked Language Models
Comments: 22 pages, accepted to EMNLP 2021 main conference
Subjects: Computation and Language (cs.CL)

Transformer architecture has become ubiquitous in the natural language processing field. To interpret the Transformer-based models, their attention patterns have been extensively analyzed. However, the Transformer architecture is not only composed of the multi-head attention; other components can also contribute to Transformers' progressive performance. In this study, we extended the scope of the analysis of Transformers from solely the attention patterns to the whole attention block, i.e., multi-head attention, residual connection, and layer normalization. Our analysis of Transformer-based masked language models shows that the token-to-token interaction performed via attention has less impact on the intermediate representations than previously assumed. These results provide new intuitive explanations of existing reports; for example, discarding the learned attention patterns tends not to adversely affect the performance. The codes of our experiments are publicly available.

[112]  arXiv:2109.07154 [pdf, other]
Title: Can Language Models be Biomedical Knowledge Bases?
Comments: EMNLP 2021. Code available at this https URL
Subjects: Computation and Language (cs.CL)

Pre-trained language models (LMs) have become ubiquitous in solving various natural language processing (NLP) tasks. There has been increasing interest in what knowledge these LMs contain and how we can extract that knowledge, treating LMs as knowledge bases (KBs). While there has been much work on probing LMs in the general domain, there has been little attention to whether these powerful LMs can be used as domain-specific KBs. To this end, we create the BioLAMA benchmark, which is comprised of 49K biomedical factual knowledge triples for probing biomedical LMs. We find that biomedical LMs with recently proposed probing methods can achieve up to 18.51% Acc@5 on retrieving biomedical knowledge. Although this seems promising given the task difficulty, our detailed analyses reveal that most predictions are highly correlated with prompt templates without any subjects, hence producing similar results on each relation and hindering their capabilities to be used as domain-specific KBs. We hope that BioLAMA can serve as a challenging benchmark for biomedical factual probing.

[113]  arXiv:2109.07157 [pdf, ps, other]
Title: Learning to Match Job Candidates Using Multilingual Bi-Encoder BERT
Authors: Dor Lavi
Comments: 2 pages, To be presented as a main talk at RecSys '21: Fifteenth ACM Conference on Recommender Systems. arXiv admin note: substantial text overlap with arXiv:2109.06501
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

In this talk, we will show how we used Randstad history of candidate placements to generate labeled CV-vacancy pairs dataset. Afterwards we fine-tune a multilingual BERT with bi encoder structure over this dataset, by adding a cosine similarity log loss layer. We will explain how using the mentioned structure helps us overcome most of the challenges described above, and how it enables us to build a maintainable and scalable pipeline to match CVs and vacancies. In addition, we show how we gain a better semantic understanding, and learn to bridge the vocabulary gap. Finally, we highlight how multilingual transformers help us handle cross language barrier and might reduce discrimination.

[114]  arXiv:2109.07161 [pdf, other]
Title: Resolution-robust Large Mask Inpainting with Fourier Convolutions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function. To alleviate this issue, we propose a new method called large mask inpainting (LaMa). LaMa is based on i) a new inpainting network architecture that uses fast Fourier convolutions, which have the image-wide receptive field; ii) a high receptive field perceptual loss; and iii) large training masks, which unlocks the potential of the first two components. Our inpainting network improves the state-of-the-art across a range of datasets and achieves excellent performance even in challenging scenarios, e.g. completion of periodic structures. Our model generalizes surprisingly well to resolutions that are higher than those seen at train time, and achieves this at lower parameter&compute costs than the competitive baselines. The code is available at https://github.com/saic-mdal/lama.

[115]  arXiv:2109.07162 [pdf, other]
Title: MISSFormer: An Effective Medical Image Segmentation Transformer
Comments: 9 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The CNN-based methods have achieved impressive results in medical image segmentation, but it failed to capture the long-range dependencies due to the inherent locality of convolution operation. Transformer-based methods are popular in vision tasks recently because of its capacity of long-range dependencies and get a promising performance. However, it lacks in modeling local context, although some works attempted to embed convolutional layer to overcome this problem and achieved some improvement, but it makes the feature inconsistent and fails to leverage the natural multi-scale features of hierarchical transformer, which limit the performance of models. In this paper, taking medical image segmentation as an example, we present MISSFormer, an effective and powerful Medical Image Segmentation tranSFormer. MISSFormer is a hierarchical encoder-decoder network and has two appealing designs: 1) A feed forward network is redesigned with the proposed Enhanced Transformer Block, which makes features aligned adaptively and enhances the long-range dependencies and local context. 2) We proposed Enhanced Transformer Context Bridge, a context bridge with the enhanced transformer block to model the long-range dependencies and local context of multi-scale features generated by our hierarchical transformer encoder. Driven by these two designs, the MISSFormer shows strong capacity to capture more valuable dependencies and context in medical image segmentation. The experiments on multi-organ and cardiac segmentation tasks demonstrate the superiority, effectiveness and robustness of our MISSFormer, the exprimental results of MISSFormer trained from scratch even outperforms state-of-the-art methods pretrained on ImageNet, and the core designs can be generalized to other visual segmentation tasks. The code will be released in Github.

[116]  arXiv:2109.07165 [pdf, other]
Title: 3D Annotation Of Arbitrary Objects In The Wild
Comments: 6 pages, 4 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent years have produced a variety of learning based methods in the context of computer vision and robotics. Most of the recently proposed methods are based on deep learning, which require very large amounts of data compared to traditional methods. The performance of the deep learning methods are largely dependent on the data distribution they were trained on, and it is important to use data from the robot's actual operating domain during training. Therefore, it is not possible to rely on pre-built, generic datasets when deploying robots in real environments, creating a need for efficient data collection and annotation in the specific operating conditions the robots will operate in. The challenge is then: how do we reduce the cost of obtaining such datasets to a point where we can easily deploy our robots in new conditions, environments and to support new sensors? As an answer to this question, we propose a data annotation pipeline based on SLAM, 3D reconstruction, and 3D-to-2D geometry. The pipeline allows creating 3D and 2D bounding boxes, along with per-pixel annotations of arbitrary objects without needing accurate 3D models of the objects prior to data collection and annotation. Our results showcase almost 90% Intersection-over-Union (IoU) agreement on both semantic segmentation and 2D bounding box detection across a variety of objects and scenes, while speeding up the annotation process by several orders of magnitude compared to traditional manual annotation.

[117]  arXiv:2109.07169 [pdf, other]
Title: Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders
Comments: Findings of EMNLP 2021
Subjects: Computation and Language (cs.CL)

The ability of learning disentangled representations represents a major step for interpretable NLP systems as it allows latent linguistic features to be controlled. Most approaches to disentanglement rely on continuous variables, both for images and text. We argue that despite being suitable for image datasets, continuous variables may not be ideal to model features of textual data, due to the fact that most generative factors in text are discrete. We propose a Variational Autoencoder based method which models language features as discrete variables and encourages independence between variables for learning disentangled representations. The proposed model outperforms continuous and discrete baselines on several qualitative and quantitative benchmarks for disentanglement as well as on a text style transfer downstream application.

[118]  arXiv:2109.07170 [pdf, other]
Title: Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior
Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Information Retrieval (cs.IR)

The textual content of a document and its publication date are intertwined. For example, the publication of a news article on a topic is influenced by previous publications on similar issues, according to underlying temporal dynamics. However, it can be challenging to retrieve meaningful information when textual information conveys little information or when temporal dynamics are hard to unveil. Furthermore, the textual content of a document is not always linked to its temporal dynamics. We develop a flexible method to create clusters of textual documents according to both their content and publication time, the Powered Dirichlet-Hawkes process (PDHP). We show PDHP yields significantly better results than state-of-the-art models when temporal information or textual content is weakly informative. The PDHP also alleviates the hypothesis that textual content and temporal dynamics are always perfectly correlated. PDHP allows retrieving textual clusters, temporal clusters, or a mixture of both with high accuracy when they are not. We demonstrate that PDHP generalizes previous work --such as the Dirichlet-Hawkes process (DHP) and Uniform process (UP). Finally, we illustrate the changes induced by PDHP over DHP and UP in a real-world application using Reddit data.

[119]  arXiv:2109.07171 [pdf, other]
Title: Balancing detectability and performance of attacks on the control channel of Markov Decision Processes
Subjects: Systems and Control (eess.SY); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs). This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods. The policies resulting from these methods have been shown to be vulnerable to attacks perturbing the observations of the decision-maker. In such an attack, drawing inspiration from adversarial examples used in supervised learning, the amplitude of the adversarial perturbation is limited according to some norm, with the hope that this constraint will make the attack imperceptible. However, such constraints do not grant any level of undetectability and do not take into account the dynamic nature of the underlying Markov process. In this paper, we propose a new attack formulation, based on information-theoretical quantities, that considers the objective of minimizing the detectability of the attack as well as the performance of the controlled process. We analyze the trade-off between the efficiency of the attack and its detectability. We conclude with examples and numerical simulations illustrating this trade-off.

[120]  arXiv:2109.07173 [pdf, other]
Title: A Comparison of Code Embeddings and Beyond
Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)

Program representation learning is a fundamental task in software engineering applications. With the availability of "big code" and the development of deep learning techniques, various program representation learning models have been proposed to understand the semantic properties of programs and applied on different software engineering tasks. However, no previous study has comprehensively assessed the generalizability of these deep models on different tasks, so that the pros and cons of the models are unclear. In this experience paper, we try to bridge this gap by systemically evaluating the performance of eight program representation learning models on three common tasks, where six models are based on abstract syntax trees and two models are based on plain text of source code. We kindly explain the criteria for selecting the models and tasks, as well as the method for enabling end-to-end learning in each task. The results of performance evaluation show that they perform diversely in each task and the performance of the AST-based models is generally unstable over different tasks. In order to further explain the results, we apply a prediction attribution technique to find what elements are captured by the models and responsible for the predictions in each task. Based on the findings, we discuss some general principles for better capturing the information in the source code, and hope to inspire researchers to improve program representation learning methods for software engineering tasks.

[121]  arXiv:2109.07177 [pdf, other]
Title: Adversarial Mixing Policy for Relaxing Locally Linear Constraints in Mixup
Comments: This paper is accepted to appear in the main conference of EMNLP2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Mixup is a recent regularizer for current deep classification networks. Through training a neural network on convex combinations of pairs of examples and their labels, it imposes locally linear constraints on the model's input space. However, such strict linear constraints often lead to under-fitting which degrades the effects of regularization. Noticeably, this issue is getting more serious when the resource is extremely limited. To address these issues, we propose the Adversarial Mixing Policy (AMP), organized in a min-max-rand formulation, to relax the Locally Linear Constraints in Mixup. Specifically, AMP adds a small adversarial perturbation to the mixing coefficients rather than the examples. Thus, slight non-linearity is injected in-between the synthetic examples and synthetic labels. By training on these data, the deep networks are further regularized, and thus achieve a lower predictive error rate. Experiments on five text classification benchmarks and five backbone models have empirically shown that our methods reduce the error rate over Mixup variants in a significant margin (up to 31.3%), especially in low-resource conditions (up to 17.5%).

[122]  arXiv:2109.07180 [pdf, other]
Title: Back to Basics: Deep Reinforcement Learning in Traffic Signal Control
Comments: 9 pages, 4 figures; code for this paper is available at this https URL
Subjects: Machine Learning (cs.LG)

In this paper we revisit some of the fundamental premises for a reinforcement learning (RL) approach to self-learning traffic lights. We propose RLight, a combination of choices that offers robust performance and good generalization to unseen traffic flows. In particular, our main contributions are threefold: our lightweight and cluster-aware state representation leads to improved performance; we reformulate the MDP such that it skips redundant timesteps of yellow light, speeding up learning by 30%; and we investigate the action space and provide insight into the difference in performance between acyclic and cyclic phase transitions. Additionally, we provide insights into the generalisation of the methods to unseen traffic. Evaluations using the real-world Hangzhou traffic dataset show that RLight outperforms state-of-the-art rule-based and deep reinforcement learning algorithms, demonstrating the potential of RL-based methods to improve urban traffic flows.

[123]  arXiv:2109.07183 [pdf, other]
Title: Residual viscosity stabilized RBF-FD methods for solving nonlinear conservation laws
Subjects: Numerical Analysis (math.NA)

We formulate an oversampled radial basis function generated finite difference (RBF-FD) method to solve time-dependent nonlinear conservation laws. The analytic solutions of these problems are known to be discontinuous, which leads to occurrence of non-physical oscillations (Gibbs phenomenon) that pollute the numerical solutions and can make them unstable. We address these difficulties using a residual based artificial viscosity stabilization, where the residual of the conservation law indicates the approximate location of the shocks. The location is then used to locally apply an upwind viscosity term, which stabilizes the Gibbs phenomenon and does not smear the solution away from the shocks. The proposed method is numerically tested and proves to be robust and accurate when solving scalar conservation laws and systems of conservation laws, such as compressible Euler equations.

[124]  arXiv:2109.07185 [pdf, other]
Title: Transformer-based Language Models for Factoid Question Answering at BioASQ9b
Comments: 12 pages, 3 figures, 4 tables. Accepted at BioASQ Workshop, CLEF Working Notes
Subjects: Computation and Language (cs.CL)

In this work, we describe our experiments and participating systems in the BioASQ Task 9b Phase B challenge of biomedical question answering. We have focused on finding the ideal answers and investigated multi-task fine-tuning and gradual unfreezing techniques on transformer-based language models. For factoid questions, our ALBERT-based systems ranked first in test batch 1 and fourth in test batch 2. Our DistilBERT systems outperformed the ALBERT variants in test batches 4 and 5 despite having 81% fewer parameters than ALBERT. However, we observed that gradual unfreezing had no significant impact on the model's accuracy compared to standard fine-tuning.

[125]  arXiv:2109.07189 [pdf, ps, other]
Title: On Characterization of Finite Geometric Distributive Lattices
Authors: Pranab Basu
Comments: 11 pages, 2 figures, submitted to Journal of Combinatorial Theory, Series A
Subjects: Discrete Mathematics (cs.DM); Information Theory (cs.IT); Combinatorics (math.CO)

A Lattice is a partially ordered set where both least upper bound and greatest lower bound of any pair of elements are unique and exist within the set. K\"{o}tter and Kschischang proved that codes in the linear lattice can be used for error and erasure-correction in random networks. Codes in the linear lattice have previously been shown to be special cases of codes in modular lattices. Two well known classifications of modular lattices are geometric and distributive lattices. We have identified the unique criterion which makes a geometric lattice distributive, thus characterizing all finite geometric distributive lattices. Our characterization helps to prove a conjecture regarding the maximum size of a distributive sublattice of a finite geometric lattice and identify the maximal case. The Whitney numbers of the class of geometric distributive lattices are also calculated. We present a few other applications of this unique characterization to derive certain results regarding linearity and complements in the linear lattice.

[126]  arXiv:2109.07193 [pdf, other]
Title: FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack
Comments: 9 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Physical adversarial attacks in object detection have attracted increasing attention. However, most previous works focus on hiding the objects from the detector by generating an individual adversarial patch, which only covers the planar part of the vehicle's surface and fails to attack the detector in physical scenarios for multi-view, long-distance and partially occluded objects. To bridge the gap between digital attacks and physical attacks, we exploit the full 3D vehicle surface to propose a robust Full-coverage Camouflage Attack (FCA) to fool detectors. Specifically, we first try rendering the non-planar camouflage texture over the full vehicle surface. To mimic the real-world environment conditions, we then introduce a transformation function to transfer the rendered camouflaged vehicle into a photo-realistic scenario. Finally, we design an efficient loss function to optimize the camouflage texture. Experiments show that the full-coverage camouflage attack can not only outperform state-of-the-art methods under various test cases but also generalize to different environments, vehicles, and object detectors.

[127]  arXiv:2109.07194 [pdf, other]
Title: Multiagent Multimodal Categorization for Symbol Emergence: Emergent Communication via Interpersonal Cross-modal Inference
Comments: 27 pages, 5 figures, 12 tables
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This paper describes a computational model of multiagent multimodal categorization that realizes emergent communication. We clarify whether the computational model can reproduce the following functions in a symbol emergence system, comprising two agents with different sensory modalities playing a naming game. (1) Function for forming a shared lexical system that comprises perceptual categories and corresponding signs, formed by agents through individual learning and semiotic communication between agents. (2) Function to improve the categorization accuracy in an agent via semiotic communication with another agent, even when some sensory modalities of each agent are missing. (3) Function that an agent infers unobserved sensory information based on a sign sampled from another agent in the same manner as cross-modal inference. We propose an interpersonal multimodal Dirichlet mixture (Inter-MDM), which is derived by dividing an integrative probabilistic generative model, which is obtained by integrating two Dirichlet mixtures (DMs). The Markov chain Monte Carlo algorithm realizes emergent communication. The experimental results demonstrated that Inter-MDM enables agents to form multimodal categories and appropriately share signs between agents. It is shown that emergent communication improves categorization accuracy, even when some sensory modalities are missing. Inter-MDM enables an agent to predict unobserved information based on a shared sign.

[128]  arXiv:2109.07195 [pdf, other]
Title: Target Languages (vs. Inductive Biases) for Learning to Act and Plan
Authors: Hector Geffner
Subjects: Artificial Intelligence (cs.AI)

Recent breakthroughs in AI have shown the remarkable power of deep learning and deep reinforcement learning. These developments, however, have been tied to specific tasks, and progress in out-of-distribution generalization has been limited. While it is assumed that these limitations can be overcome by incorporating suitable inductive biases, the notion of inductive biases itself is often left vague and does not provide meaningful guidance. In the paper, I articulate a different learning approach where representations do not emerge from biases in a neural architecture but are learned over a given target language with a known semantics. The basic ideas are implicit in mainstream AI where representations have been encoded in languages ranging from fragments of first-order logic to probabilistic structural causal models. The challenge is to learn from data, the representations that have traditionally been crafted by hand. Generalization is then a result of the semantics of the language. The goals of the paper and talk are to make these ideas explicit, to place them in a broader context where the design of the target language is crucial, and to illustrate them in the context of learning to act and plan. For this, after a general discussion, I consider learning representations of actions, general policies, and general decompositions. In these cases, learning is formulated as a combinatorial optimization problem but nothing prevents the use deep learning techniques instead. Indeed, learning representations over languages with a known semantics provides an account of what is to be learned, while learning representations with neural nets provides a complementary account of how representations can be learned. The challenge and the opportunity is to bring the two together.

[129]  arXiv:2109.07196 [pdf, ps, other]
Title: Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot
Comments: 6 pages, 7 figures, submitted to ICRA 2022
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Whole-body control (WBC) has been applied to the locomotion of legged robots. However, current WBC methods have not considered the intrinsic features of parallel mechanisms, especially motion/force transmissibility (MFT). In this work, we propose an MFT-enhanced WBC scheme. Introducing MFT into a WBC is challenging due to the nonlinear relationship between MFT indices and the robot configuration. To overcome this challenge, we establish the MFT preferable space of the robot and formulate it as a polyhedron in the joint space at the acceleration level. Then, the WBC employs the polyhedron as a soft constraint. As a result, the robot possesses high-speed and high-acceleration capabilities by satisfying this constraint as well as staying away from its singularity. In contrast with the WBC without considering MFT, our proposed scheme is more robust to external disturbances, e.g., push recovery and uneven terrain locomotion. simulations and experiments on a parallel-legged bipedal robot are provided to demonstrate the performance and robustness of the proposed method.

[130]  arXiv:2109.07201 [pdf, other]
Title: Expectable Motion Unit: Avoiding Hazards From Human Involuntary Motions in Human-Robot Interaction
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO)

In robotics, many control and planning schemes have been developed that ensure the human physical safety in human-robot interaction. The human psychological state and expectation towards the robot, however, are typically neglected. Even if the robot behaviour is regarded as biomechanically safe, humans may still react with rapid involuntary motion (IM) caused by startle or surprise. Obviously, such sudden, uncontrolled motions can jeopardize safety and should be prevented by any means. In this paper, we propose the Expectable Motion Unit (EMU) concept which ensures that a certain probability of IM occurrence is not exceeded in a typical HRI setting. Based on a model of IM occurrence that we generate through an experiment with 29 participants, the mapping between robot velocity, robot-human distance, and the relative frequency of IM occurrence is established. This mapping is processed towards a real-time capable robot motion generator, which limits the robot velocity during task execution if necessary. The EMU is combined with the well-established Safe Motion Unit in order to integrate both physical and psychological safety knowledge and data into a holistic safety framework. In a validation experiment, it was shown that the EMU successfully avoids human IM in five out of six cases.

[131]  arXiv:2109.07202 [pdf, other]
Title: Deep 3D Mesh Watermarking with Self-Adaptive Robustness
Subjects: Graphics (cs.GR)

Robust 3D mesh watermarking is a traditional research topic in computer graphics, which provides an efficient solution to the copyright protection for 3D meshes. Traditionally, researchers need manually design watermarking algorithms to achieve sufficient robustness for the actual application scenarios. In this paper, we propose the first deep learning-based 3D mesh watermarking framework, which can solve this problem once for all. In detail, we propose an end-to-end network, consisting of a watermark embedding sub-network, a watermark extracting sub-network and attack layers. We adopt the topology-agnostic graph convolutional network (GCN) as the basic convolution operation for 3D meshes, so our network is not limited by registered meshes (which share a fixed topology). For the specific application scenario, we can integrate the corresponding attack layers to guarantee adaptive robustness against possible attacks. To ensure the visual quality of watermarked 3D meshes, we design a curvature-based loss function to constrain the local geometry smoothness of watermarked meshes. Experimental results show that the proposed method can achieve more universal robustness and faster watermark embedding than baseline methods while guaranteeing comparable visual quality.

[132]  arXiv:2109.07203 [pdf]
Title: Sentiment Analysis in Poems in Misurata Sub-dialect -- A Sentiment Detection in an Arabic Sub-dialect
Authors: Azza Abugharsa
Subjects: Computation and Language (cs.CL)

Over the recent decades, there has been a significant increase and development of resources for Arabic natural language processing. This includes the task of exploring Arabic Language Sentiment Analysis (ALSA) from Arabic utterances in both Modern Standard Arabic (MSA) and different Arabic dialects. This study focuses on detecting sentiment in poems written in Misurata Arabic sub-dialect spoken in Misurata, Libya. The tools used to detect sentiment from the dataset are Sklearn as well as Mazajak sentiment tool 1. Logistic Regression, Random Forest, Naive Bayes (NB), and Support Vector Machines (SVM) classifiers are used with Sklearn, while the Convolutional Neural Network (CNN) is implemented with Mazajak. The results show that the traditional classifiers score a higher level of accuracy as compared to Mazajak which is built on an algorithm that includes deep learning techniques. More research is suggested to analyze Arabic sub-dialect poetry in order to investigate the aspects that contribute to sentiments in these multi-line texts; for example, the use of figurative language such as metaphors.

[133]  arXiv:2109.07204 [pdf, other]
Title: Realization of Neural Network-based Optical Channel Equalizer in Restricted Hardware
Comments: Neural network, Nonlinear equalizer, Pruning, Quantization, Raspberry pi, Coherent detection
Subjects: Systems and Control (eess.SY)

We quantify the achievable reduction of the processing complexity of artificial neural network-based equalizers in a coherent optical channel using the pruning and quantization techniques. First, we explain how to correctly compute the complexity of the compressed equalizer in the DSP sense. Then, considering a basic neural network architecture, a multiplayer perceptron, we, for the first time, assess the complexity reduction attainable noticeable performance degradation, considering 30GBd 1000km transmission over a standard single-mode fiber. We demonstrate a possibility of reducing the equalizer's memory by up to 95.7%, and the complexity up to 91.5%, without noticeable performance degradation. Finally, the compressed equalizer's functioning is demonstrated experimentally using popular resource-constrained hardware: the Raspberry Pi, which would not have been possible without model compression.

[134]  arXiv:2109.07205 [pdf, other]
Title: A Relation-Oriented Clustering Method for Open Relation Extraction
Comments: 12 pages, 6figures, emnlp2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The clustering-based unsupervised relation discovery method has gradually become one of the important methods of open relation extraction (OpenRE). However, high-dimensional vectors can encode complex linguistic information which leads to the problem that the derived clusters cannot explicitly align with the relational semantic classes. In this work, we propose a relation-oriented clustering model and use it to identify the novel relations in the unlabeled data. Specifically, to enable the model to learn to cluster relational data, our method leverages the readily available labeled data of pre-defined relations to learn a relation-oriented representation. We minimize distance between the instance with same relation by gathering the instances towards their corresponding relation centroids to form a cluster structure, so that the learned representation is cluster-friendly. To reduce the clustering bias on predefined classes, we optimize the model by minimizing a joint objective on both labeled and unlabeled data. Experimental results show that our method reduces the error rate by 29.2% and 15.7%, on two datasets respectively, compared with current SOTA methods.

[135]  arXiv:2109.07206 [pdf, other]
Title: Signaling Design for Cooperative Resource Allocation and its Impact to Reliability
Subjects: Networking and Internet Architecture (cs.NI)

Decentralized cooperative resource allocation schemes for robotic swarms are essential to enable high reliability in high throughput data exchanges. These cooperative schemes require control signaling with the aim to avoid half-duplex problems at the receiver and mitigate interference. We propose two cooperative resource allocation schemes, device sequential and group scheduling, and introduce a control signaling design. We observe that failure in the reception of these control signals leads to non-cooperative behavior and to significant performance degradation. The cause of these failures are identified and specific countermeasures are proposed and evaluated. We compare the proposed resource allocation schemes against the NR sidelink mode 2 resource allocation and show that even though signaling has an important impact on the resource allocation performance, our proposed device sequential and group scheduling resource allocation schemes improve reliability by an order of magnitude compared to sidelink mode 2.

[136]  arXiv:2109.07207 [pdf, other]
Title: Fusing Visuo-Tactile Perception into Kernelized Synergies for Robust Grasping and Fine Manipulation of Non-rigid Objects
Comments: IEEE ICRA 2022 (under review)
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Handling non-rigid objects using robot hands necessities a framework that does not only incorporate human-level dexterity and cognition but also the multi-sensory information and system dynamics for robust and fine interactions. In this research, our previously developed kernelized synergies framework, inspired from human behaviour on reusing same subspace for grasping and manipulation, is augmented with visuo-tactile perception for autonomous and flexible adaptation to unknown objects. To detect objects and estimate their poses, a simplified visual pipeline using RANSAC algorithm with Euclidean clustering and SVM classifier is exploited. To modulate interaction efforts while grasping and manipulating non-rigid objects, the tactile feedback using T40S shokac chip sensor, generating 3D force information, is incorporated. Moreover, different kernel functions are examined in the kernelized synergies framework, to evaluate its performance and potential against task reproducibility, execution, generalization and synergistic re-usability. Experiments performed with robot arm-hand system validates the capability and usability of upgraded framework on stably grasping and dexterously manipulating the non-rigid objects.

[137]  arXiv:2109.07210 [pdf]
Title: Life-Long Multi-Task Learning of Adaptive Path Tracking Policy for Autonomous Vehicle
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper proposes a life-long adaptive path tracking policy learning method for autonomous vehicles that can self-evolve and self-adapt with multi-task knowledge. Firstly, the proposed method can learn a model-free control policy for path tracking directly from the historical driving experience, where the property of vehicle dynamics and corresponding control strategy can be learned simultaneously. Secondly, by utilizing the life-long learning method, the proposed method can learn the policy with task-incremental knowledge without encountering catastrophic forgetting. Thus, with continual multi-task knowledge learned, the policy can iteratively adapt to new tasks and improve its performance with knowledge from new tasks. Thirdly, a memory evaluation and updating method is applied to optimize memory structure for life-long learning which enables the policy to learn toward selected directions. Experiments are conducted using a high-fidelity vehicle dynamic model in a complex curvy road to evaluate the performance of the proposed method. Results show that the proposed method can effectively evolve with continual multi-task knowledge and adapt to the new environment, where the performance of the proposed method can also surpass two commonly used baseline methods after evolving.

[138]  arXiv:2109.07212 [pdf, other]
Title: Optimising Rolling Stock Planning including Maintenance with Constraint Programming and Quantum Annealing
Subjects: Artificial Intelligence (cs.AI); Statistical Finance (q-fin.ST)

We developed and compared Constraint Programming (CP) and Quantum Annealing (QA) approaches for rolling stock optimisation considering necessary maintenance tasks. To deal with such problems in CP we investigated specialised pruning rules and implemented them in a global constraint. For the QA approach, we developed quadratic unconstrained binary optimisation (QUBO) models. For testing, we use data sets based on real data from Deutsche Bahn and run the QA approach on real quantum computers from D-Wave. Classical computers are used to run the CP approach as well as tabu search for the QUBO models. We find that both approaches tend at the current development stage of the physical quantum annealers to produce comparable results, with the caveat that QUBO does not always guarantee that the maintenance constraints hold, which we fix by adjusting the QUBO model in preprocessing, based on how close the trains are to a maintenance threshold distance.

[139]  arXiv:2109.07217 [pdf, other]
Title: Progressive Hard-case Mining across Pyramid Levels in Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In object detection, multi-level prediction (e.g., FPN, YOLO) and resampling skills (e.g., focal loss, ATSS) have drastically improved one-stage detector performance. However, how to improve the performance by optimizing the feature pyramid level-by-level remains unexplored. We find that, during training, the ratio of positive over negative samples varies across pyramid levels (\emph{level imbalance}), which is not addressed by current one-stage detectors. To mediate the influence of level imbalance, we propose a Unified Multi-level Optimization Paradigm (UMOP) consisting of two components: 1) an independent classification loss supervising each pyramid level with individual resampling considerations; 2) a progressive hard-case mining loss defining all losses across the pyramid levels without extra level-wise settings. With UMOP as a plug-and-play scheme, modern one-stage detectors can attain a ~1.5 AP improvement with fewer training iterations and no additional computation overhead. Our best model achieves 55.1 AP on COCO test-dev. Code is available at https://github.com/zimoqingfeng/UMOP.

[140]  arXiv:2109.07218 [pdf, other]
Title: Limitations of the Invertible-Map Equivalences
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)

This note draws conclusions that arise by combining two recent papers, by Anuj Dawar, Erich Gr\"adel, and Wied Pakusa, published at ICALP 2019 and by Moritz Lichter, published at LICS 2021. In both papers, the main technical results rely on the combinatorial and algebraic analysis of the invertible-map equivalences $\equiv^\text{IM}_{k,Q}$ on certain variants of Cai-F\"urer-Immerman (CFI) structures. These $\equiv^\text{IM}_{k,Q}$-equivalences, for a natural number $k$ and a set of primes $Q$, refine the well-known Weisfeiler-Leman equivalences used in algorithms for graph isomorphism. The intuition is that two graphs $G \equiv^\text{IM}_{k,Q} H$ cannot be distinguished by iterative refinements of equivalences on $k$-tuples defined via linear operators on vector spaces over fields of characteristic $p \in Q$.
In the first paper it has been shown that for a prime $q \notin Q$, the $\equiv^\text{IM}_{k,Q}$ equivalences are not strong enough to distinguish between non-isomorphic CFI-structures over the field $\mathbb{F}_q$. In the second paper, a similar but not identical construction for CFI-structures over the rings $\mathbb{Z}_{2^i}$ has been shown to be indistinguishable with respect to $\equiv^\text{IM}_{k,\{2\}}$. Together with earlier work on rank logic, this second result suffices to separate rank logic from polynomial time. We show here that the two approaches can be unified to prove that CFI-structures over the rings $\mathbb{Z}_{2^i}$ are indistinguishable with respect to $\equiv^\text{IM}_{k,\mathbb{P}}$, for the set $\mathbb{P}$ of all primes. This implies the following two results.
1. There is no fixed $k$ such that the invertible-map equivalence $\equiv^\text{IM}_{k,\mathbb{P}}$ coincides with isomorphism on all finite graphs.
2. No extension of fixed-point logic by linear-algebraic operators over fields can capture polynomial time.

[141]  arXiv:2109.07222 [pdf, other]
Title: {E}fficient{BERT}: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
Comments: Findings of EMNLP 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Pre-trained language models have shown remarkable results on various NLP tasks. Nevertheless, due to their bulky size and slow inference speed, it is hard to deploy them on edge devices. In this paper, we have a critical insight that improving the feed-forward network (FFN) in BERT has a higher gain than improving the multi-head attention (MHA) since the computational cost of FFN is 2$\sim$3 times larger than MHA. Hence, to compact BERT, we are devoted to designing efficient FFN as opposed to previous works that pay attention to MHA. Since FFN comprises a multilayer perceptron (MLP) that is essential in BERT optimization, we further design a thorough search space towards an advanced MLP and perform a coarse-to-fine mechanism to search for an efficient BERT architecture. Moreover, to accelerate searching and enhance model transferability, we employ a novel warm-up knowledge distillation strategy at each search stage. Extensive experiments show our searched EfficientBERT is 6.9$\times$ smaller and 4.4$\times$ faster than BERT$\rm_{BASE}$, and has competitive performances on GLUE and SQuAD Benchmarks. Concretely, EfficientBERT attains a 77.7 average score on GLUE \emph{test}, 0.7 higher than MobileBERT$\rm_{TINY}$, and achieves an 85.3/74.5 F1 score on SQuAD v1.1/v2.0 \emph{dev}, 3.2/2.7 higher than TinyBERT$_4$ even without data augmentation. The code is released at https://github.com/cheneydon/efficient-bert.

[142]  arXiv:2109.07227 [pdf, other]
Title: How Much do Lyrics Matter? Analysing Lyrical Simplicity Preferences for Individuals At Risk of Depression
Comments: In Proceedings of the Speech, Music and Mind Workshop 2021, a satellite workshop of INTERSPEECH 2021
Subjects: Computation and Language (cs.CL)

Music affects and in some cases reflects one's emotional state. Key to this influence is lyrics and their meaning in conjunction with the acoustic properties of the track. Recent work has focused on analysing these acoustic properties and showing that individuals prone to depression primarily consume low valence and low energy music. However, no studies yet have explored lyrical content preferences in relation to online music consumption of such individuals. In the current study, we examine lyrical simplicity, measured as the Compressibility and Absolute Information Content of the text, associated with preferences of individuals at risk for depression. Using the six-month listening history of 541 Last.fm users, we compare lyrical simplicity trends for users grouped as being at risk (At-Risk) of depression from those that are not (No-Risk). Our findings reveal that At-Risk individuals prefer songs with greater information content (lower Compressibility) on average, especially for songs characterised as Sad. Furthermore, we found that At-Risk individuals also have greater variability of Absolute Information Content across their listening history. We discuss the results in light of existing socio-psychological lab-based research on music habits associated with depression and their relevance to naturally occurring online music listening behaviour.

[143]  arXiv:2109.07228 [pdf, other]
Title: Dialog speech sentiment classification for imbalanced datasets
Comments: To be published in SPECOM & ICR 2021 Electronic Proceedings by the Springer Nature
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Speech is the most common way humans express their feelings, and sentiment analysis is the use of tools such as natural language processing and computational algorithms to identify the polarity of these feelings. Even though this field has seen tremendous advancements in the last two decades, the task of effectively detecting under represented sentiments in different kinds of datasets is still a challenging task. In this paper, we use single and bi-modal analysis of short dialog utterances and gain insights on the main factors that aid in sentiment detection, particularly in the underrepresented classes, in datasets with and without inherent sentiment component. Furthermore, we propose an architecture which uses a learning rate scheduler and different monitoring criteria and provides state-of-the-art results for the SWITCHBOARD imbalanced sentiment dataset.

[144]  arXiv:2109.07230 [pdf, other]
Title: Learning Mathematical Properties of Integers
Comments: BlackboxNLP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Embedding words in high-dimensional vector spaces has proven valuable in many natural language applications. In this work, we investigate whether similarly-trained embeddings of integers can capture concepts that are useful for mathematical applications. We probe the integer embeddings for mathematical knowledge, apply them to a set of numerical reasoning tasks, and show that by learning the representations from mathematical sequence data, we can substantially improve over number embeddings learned from English text corpora.

[145]  arXiv:2109.07231 [pdf, other]
Title: SWEAT: Scoring Polarization of Topics across Different Corpora
Comments: Published as a conference paper at EMNLP2021
Subjects: Computation and Language (cs.CL)

Understanding differences of viewpoints across corpora is a fundamental task for computational social sciences. In this paper, we propose the Sliced Word Embedding Association Test (SWEAT), a novel statistical measure to compute the relative polarization of a topical wordset across two distributional representations. To this end, SWEAT uses two additional wordsets, deemed to have opposite valence, to represent two different poles. We validate our approach and illustrate a case study to show the usefulness of the introduced measure.

[146]  arXiv:2109.07234 [pdf, ps, other]
Title: The Unreasonable Effectiveness of the Baseline: Discussing SVMs in Legal Text Classification
Subjects: Computation and Language (cs.CL)

We aim to highlight an interesting trend to contribute to the ongoing debate around advances within legal Natural Language Processing. Recently, the focus for most legal text classification tasks has shifted towards large pre-trained deep learning models such as BERT. In this paper, we show that a more traditional approach based on Support Vector Machine classifiers reaches competitive performance with deep learning models. We also highlight that error reduction obtained by using specialised BERT-based models over baselines is noticeably smaller in the legal domain when compared to general language tasks. We discuss some hypotheses for these results to support future discussions.

[147]  arXiv:2109.07236 [pdf]
Title: Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition
Comments: 6 pages, 9 figures, submitted to ICRA 2022
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Redundant robots are desired to execute multitasks with different priorities simultaneously. The task priorities are necessary to be transitioned for complex task scheduling of whole-body control (WBC). Many methods focused on guaranteeing the control continuity during task priority transition, however either increased the computation consumption or sacrificed the accuracy of tasks inevitably. This work formulates the WBC problem with task priority transition as an Hierarchical Quadratic Programming (HQP) with Recursive Hierarchical Projection (RHP) matrices. The tasks of each level are solved recursively through HQP. We propose the RHP matrix to form the continuously changing projection of each level so that the task priority transition is achieved without increasing computation consumption. Additionally, the recursive approach solves the WBC problem without losing the accuracy of tasks. We verify the effectiveness of this scheme by the comparative simulations of the reactive collision avoidance through multi-tasks priority transitions.

[148]  arXiv:2109.07239 [pdf, other]
Title: Internet of Behavior (IoB) and Explainable AI Systems for Influencing IoT Behavior
Comments: Submitted to IEEE Network
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Pandemics and natural disasters over the years have changed the behavior of people, which has had a tremendous impact on all life aspects. With the technologies available in each era, governments, organizations, and companies have used these technologies to track, control, and influence the behavior of individuals for a benefit. Nowadays, the use of the Internet of Things (IoT), cloud computing, and artificial intelligence (AI) have made it easier to track and change the behavior of users through changing IoT behavior. This article introduces and discusses the concept of the Internet of Behavior (IoB) and its integration with Explainable AI (XAI) techniques to provide trusted and evident experience in the process of changing IoT behavior to ultimately improving users' behavior. Therefore, a system based on IoB and XAI has been proposed in a use case scenario of electrical power consumption that aims to influence user consuming behavior to reduce power consumption and cost. The scenario results showed a decrease of 522.2 kW of active power when compared to original consumption over a 200-hours period. It also showed a total power cost saving of 95.04 Euro for the same period. Moreover, decreasing the global active power will reduce the power intensity through the positive correlation.

[149]  arXiv:2109.07242 [pdf, other]
Title: Regressive Ensemble for Machine Translation Quality Evaluation
Comments: 8 pages incl. references, Proceedings of EMNLP 2021 Sixth Conference on Machine Translation (WMT 21)
Subjects: Computation and Language (cs.CL)

This work introduces a simple regressive ensemble for evaluating machine translation quality based on a set of novel and established metrics. We evaluate the ensemble using a correlation to expert-based MQM scores of the WMT 2021 Metrics workshop. In both monolingual and zero-shot cross-lingual settings, we show a significant performance improvement over single metrics. In the cross-lingual settings, we also demonstrate that an ensemble approach is well-applicable to unseen languages. Furthermore, we identify a strong reference-free baseline that consistently outperforms the commonly-used BLEU and METEOR measures and significantly improves our ensemble's performance.

[150]  arXiv:2109.07243 [pdf, other]
Title: Enhancing Clinical Information Extraction with Transferred Contextual Embeddings
Comments: 6 pages, 4 figures
Subjects: Computation and Language (cs.CL)

The Bidirectional Encoder Representations from Transformers (BERT) model has achieved the state-of-the-art performance for many natural language processing (NLP) tasks. Yet, limited research has been contributed to studying its effectiveness when the target domain is shifted from the pre-training corpora, for example, for biomedical or clinical NLP applications. In this paper, we applied it to a widely studied a hospital information extraction (IE) task and analyzed its performance under the transfer learning setting. Our application became the new state-of-the-art result by a clear margin, compared with a range of existing IE models. Specifically, on this nursing handover data set, the macro-average F1 score from our model was 0.438, whilst the previous best deep learning models had 0.416. In conclusion, we showed that BERT based pre-training models can be transferred to health-related documents under mild conditions and with a proper fine-tuning process.

[151]  arXiv:2109.07245 [pdf, other]
Title: Navigation-Oriented Scene Understanding for Robotic Autonomy: Learning to Segment Driveability in Egocentric Images
Comments: Submitted to Robotics and Automation Letters. Supplementary video available at this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

This work tackles scene understanding for outdoor robotic navigation, solely relying on images captured by an on-board camera. Conventional visual scene understanding interprets the environment based on specific descriptive categories. However, such a representation is not directly interpretable for decision-making and constrains robot operation to a specific domain. Thus, we propose to segment egocentric images directly in terms of how a robot can navigate in them, and tailor the learning problem to an autonomous navigation task. Building around an image segmentation network, we present a generic and scalable affordance-based definition consisting of 3 driveability levels which can be applied to arbitrary scenes. By encoding these levels with soft ordinal labels, we incorporate inter-class distances during learning which improves segmentation compared to standard one-hot labelling. In addition, we propose a navigation-oriented pixel-wise loss weighting method which assigns higher importance to safety-critical areas. We evaluate our approach on large-scale public image segmentation datasets spanning off-road and urban scenes. In a zero-shot cross-dataset generalization experiment, we show that our affordance learning scheme can be applied across a diverse mix of datasets and improves driveability estimation in unseen environments compared to general-purpose, single-dataset segmentation.

[152]  arXiv:2109.07246 [pdf, other]
Title: RGB-D Saliency Detection via Cascaded Mutual Information Minimization
Comments: Accepted as ICCV2021 paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning. In this paper, we introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data. Specifically, we first map the feature of each mode to a lower dimensional feature vector, and adopt mutual information minimization as a regularizer to reduce the redundancy between appearance features from RGB and geometric features from depth. We then perform multi-stage cascaded learning to impose the mutual information minimization constraint at every stage of the network. Extensive experiments on benchmark RGB-D saliency datasets illustrate the effectiveness of our framework. Further, to prosper the development of this field, we contribute the largest (7x larger than NJU2K) dataset, which contains 15,625 image pairs with high quality polygon-/scribble-/object-/instance-/rank-level annotations. Based on these rich labels, we additionally construct four new benchmarks with strong baselines and observe some interesting phenomena, which can motivate future model design. Source code and dataset are available at "https://github.com/JingZhang617/cascaded_rgbd_sod".

[153]  arXiv:2109.07247 [pdf, other]
Title: Towards Precise Pruning Points Detection using Semantic-Instance-Aware Plant Models for Grapevine Winter Pruning Automation
Comments: arXiv admin note: text overlap with arXiv:2106.04208
Subjects: Robotics (cs.RO)

Grapevine winter pruning is a complex task, that requires skilled workers to execute it correctly. The complexity makes it time consuming. It is an operation that requires about 80-120 hours per hectare annually, making an automated robotic system that helps in speeding up the process a crucial tool in large-size vineyards. We will describe (a) a novel expert annotated dataset for grapevine segmentation, (b) a state of the art neural network implementation and (c) generation of pruning points following agronomic rules, leveraging the simplified structure of the plant. With this approach, we are able to generate a set of pruning points on the canes, paving the way towards a correct automation of grapevine winter pruning.

[154]  arXiv:2109.07249 [pdf, other]
Title: Temporal Parameter-free Deep Skinning of Animated Meshes
Comments: CGI 2021, LNCS Proceedings, to appear. For video and presentation and other info please see this http URL
Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI)

In computer graphics, animation compression is essential for efficient storage, streaming and reproduction of animated meshes. Previous work has presented efficient techniques for compression by deriving skinning transformations and weights using clustering of vertices based on geometric features of vertices over time. In this work we present a novel approach that assigns vertices to bone-influenced clusters and derives weights using deep learning through a training set that consists of pairs of vertex trajectories (temporal vertex sequences) and the corresponding weights drawn from fully rigged animated characters. The approximation error of the resulting linear blend skinning scheme is significantly lower than the error of competent previous methods by producing at the same time a minimal number of bones. Furthermore, the optimal set of transformation and vertices is derived in fewer iterations due to the better initial positioning in the multidimensional variable space. Our method requires no parameters to be determined or tuned by the user during the entire process of compressing a mesh animation sequence.

[155]  arXiv:2109.07251 [pdf]
Title: Towards a new approach of continuous process improvement based on CMMI and PMBOK
Journal-ref: International Journal of Computer Science Issues (IJCSI) 9.6 (2012): 160
Subjects: Software Engineering (cs.SE)

A process-centric approach helps an organization to improve the way it works with. It allows scalability and provides a way to capitalize knowledge on best practices. It also makes better use of resources and helps to understand trends. PMBOK is a project management methodology, while CMMI is a model for process improvement. In this paper, we conduct a study on PMBOK and CMMI frameworks to show that they can be converged and complementary. We expect this paper research will be useful for organizations to deploy a new approach of continuous process improvement based on pooling CMMI and PMBOK.

[156]  arXiv:2109.07252 [pdf, other]
Title: Modeling Ice Friction for Vehicle Dynamics of a Bobsled with Application in Driver Evaluation and Driving Simulation
Comments: Preprint submitted to Tribology International
Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO)

We provide an ice friction model for vehicle dynamics of a two-man bobsled which can be used for driver evaluation and in a driver-in-the-loop simulator. Longitudinal friction is modeled by combining experimental results with finite element simulations to yield a correlation between contact pressure and friction. To model lateral friction, we collect data from 44 bobsleigh runs using special sensors. Non-linear regression is used to fit a bob-specific one-track vehicle dynamics model to the data. It is applied in driving simulation and enables a novel method for bob driver evaluation. Bob drivers with various levels of experience are investigated. It shows that a similar performance of the top drivers results from different driving styles.

[157]  arXiv:2109.07253 [pdf, other]
Title: Integrating Sensing and Communication in Cellular Networks via NR Sidelink
Comments: The paper is submitted to JSAC and it is still under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

RF-sensing, the analysis and interpretation of movement or environment-induced patterns in received electromagnetic signals, has been actively investigated for more than a decade. Since electromagnetic signals, through cellular communication systems, are omnipresent, RF sensing has the potential to become a universal sensing mechanism with applications in smart home, retail, localization, gesture recognition, intrusion detection, etc. Specifically, existing cellular network installations might be dual-used for both communication and sensing. Such communications and sensing convergence is envisioned for future communication networks. We propose the use of NR-sidelink direct device-to-device communication to achieve device-initiated,flexible sensing capabilities in beyond 5G cellular communication systems. In this article, we specifically investigate a common issue related to sidelink-based RF-sensing, which is its angle and rotation dependence. In particular, we discuss transformations of mmWave point-cloud data which achieve rotational invariance, as well as distributed processing based on such rotational invariant inputs, at angle and distance diverse devices. To process the distributed data, we propose a graph based encoder to capture spatio-temporal features of the data and propose four approaches for multi-angle learning. The approaches are compared on a newly recorded and openly available dataset comprising 15 subjects, performing 21 gestures which are recorded from 8 angles.

[158]  arXiv:2109.07255 [pdf, ps, other]
Title: Learning What Others Know
Comments: 33 pages
Journal-ref: in L. Kovacs and E. Albert (eds.), LPAR23 proceedings of the International Conference on Logic for Programming AI and Reasoning, EPiC Series in Computing, Volume 73, pp 90-110, 2020
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)

We propose a number of powerful dynamic-epistemic logics for multi-agent information sharing and acts of publicly or privately accessing other agents' information databases. The static base of our logics is obtained by adding to standard epistemic logic comparative epistemic assertions, that can express epistemic superiority between groups or individuals, as well as a common distributed knowledge operator (that combines features of both common knowledge and distributed knowledge). On the dynamic side, we introduce actions by which epistemic superiority can be acquired: "sharing all one knows" (by e.g. giving access to one's information database to all or some of the other agents), as well as more complex informational events, such as hacking. We completely axiomatize several such logics and prove their decidability.

[159]  arXiv:2109.07258 [pdf, other]
Title: Federated Learning of Molecular Properties in a Heterogeneous Setting
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)

Chemistry research has both high material and computational costs to conduct experiments. Institutions thus consider chemical data to be valuable and there have been few efforts to construct large public datasets for machine learning. Another challenge is that different intuitions are interested in different classes of molecules, creating heterogeneous data that cannot be easily joined by conventional distributed training. In this work, we introduce federated heterogeneous molecular learning to address these challenges. Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients. Due to the lack of related research, we first simulate a federated heterogeneous benchmark called FedChem. FedChem is constructed by jointly performing scaffold splitting and Latent Dirichlet Allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules. We then propose a method to alleviate the problem, namely Federated Learning by Instance reweighTing (FLIT). FLIT can align the local training across heterogeneous clients by improving the performance for uncertain samples. Comprehensive experiments conducted on our new benchmark FedChem validate the advantages of this method over other federated learning schemes. FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.

[160]  arXiv:2109.07260 [pdf, other]
Title: Evaluation of Distributed Databases in Hybrid Clouds and Edge Computing: Energy, Bandwidth, and Storage Consumption
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

A benchmark study of modern distributed databases is an important source of information to select the right technology for managing data in the cloud-edge paradigms. To make the right decision, it is required to conduct an extensive experimental study on a variety of hardware infrastructures. While most of the state-of-the-art studies have investigated only response time and scalability of distributed databases, focusing on other various metrics (e.g., energy, bandwidth, and storage consumption) is essential to fully understand the resources consumption of the distributed databases. Also, existing studies have explored the response time and scalability of these databases either in private or public cloud. Hence, there is a paucity of investigation into the evaluation of these databases deployed in a hybrid cloud, which is the seamless integration of public and private cloud. To address these research gaps, in this paper, we investigate energy, bandwidth and storage consumption of the most used and common distributed databases. For this purpose, we have evaluated four open-source databases (Cassandra, Mongo, Redis and MySQL) on the hybrid cloud spanning over local OpenStack and Microsoft Azure, and a variety of edge computing nodes including Raspberry Pi, a cluster of Raspberry Pi, and low and high power servers. Our extensive experimental results reveal several helpful insights for the deployment selection of modern distributed databases in edge-cloud environments.

[161]  arXiv:2109.07262 [pdf, other]
Title: Linear-Time Contact and Friction Dynamics in Maximal Coordinates using Variational Integrators
Subjects: Robotics (cs.RO)

Simulation of contact and friction dynamics is an important basis for control- and learning-based algorithms. However, the numerical difficulties of contact interactions pose a challenge for robust and efficient simulators. A maximal-coordinate representation of the dynamics enables efficient solving algorithms, but current methods in maximal coordinates require constraint stabilization schemes. Therefore, we propose an interior-point algorithm for the numerically robust treatment of rigid-body dynamics with contact interactions in maximal coordinates. Additionally, we discretize the dynamics with a variational integrator to prevent constraint drift. Our algorithm achieves linear-time complexity both in the number of contact points and the number of bodies, which is shown theoretically and demonstrated with an implementation. Furthermore, we simulate two robotic systems to highlight the applicability of the proposed algorithm.

[162]  arXiv:2109.07263 [pdf, other]
Title: End-to-End Learning of Flowchart Grounded Task-Oriented Dialogs
Comments: D. Raghu and S.Agarwal contributed equally to this work
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We propose a novel problem within end-to-end learning of task-oriented dialogs (TOD), in which the dialog system mimics a troubleshooting agent who helps a user by diagnosing their problem (e.g., car not starting). Such dialogs are grounded in domain-specific flowcharts, which the agent is supposed to follow during the conversation. Our task exposes novel technical challenges for neural TOD, such as grounding an utterance to the flowchart without explicit annotation, referring to additional manual pages when user asks a clarification question, and ability to follow unseen flowcharts at test time. We release a dataset (FloDial) consisting of 2,738 dialogs grounded on 12 different troubleshooting flowcharts. We also design a neural model, FloNet, which uses a retrieval-augmented generation architecture to train the dialog agent. Our experiments find that FloNet can do zero-shot transfer to unseen flowcharts, and sets a strong baseline for future research.

[163]  arXiv:2109.07264 [pdf, other]
Title: Scope resolution of predicted negation cues: A two-step neural network-based approach
Authors: Daan de Jong
Subjects: Computation and Language (cs.CL)

Neural network-based methods are the state of the art in negation scope resolution. However, they often use the unrealistic assumption that cue information is completely accurate. Even if this assumption holds, there remains a dependency on engineered features from state-of-the-art machine learning methods. The current study adopted a two-step negation resolving apporach to assess whether a Bidirectional Long Short-Term Memory-based method can be used for cue detection as well, and how inaccurate cue predictions would affect the scope resolution performance. Results suggest that this method is not suitable for negation detection. Scope resolution performance is most robust against inaccurate information for models with a recurrent layer only, compared to extensions with a Conditional Random Fields layer or a post-processing algorithm. We advocate for more research into the application of deep learning on negation detection and the effect of imperfect information on scope resolution.

[164]  arXiv:2109.07266 [pdf, other]
Title: Modelling Major Disease Outbreaks in the 21st Century: A Causal Approach
Comments: Accepted at Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD-epiDAMIK) 2021: The 4th International Workshop on Epidemiology meets Data Mining and Knowledge discovery
Subjects: Machine Learning (cs.LG); Applications (stat.AP)

Epidemiologists aiming to model the dynamics of global events face a significant challenge in identifying the factors linked with anomalies such as disease outbreaks. In this paper, we present a novel method for identifying the most important development sectors sensitive to disease outbreaks by using global development indicators as markers. We use statistical methods to assess the causative linkages between these indicators and disease outbreaks, as well as to find the most often ranked indicators. We used data imputation techniques in addition to statistical analysis to convert raw real-world data sets into meaningful data for causal inference. The application of various algorithms for the detection of causal linkages between the indicators is the subject of this research. Despite the fact that disparities in governmental policies between countries account for differences in causal linkages, several indicators emerge as important determinants sensitive to disease outbreaks over the world in the 21st Century.

[165]  arXiv:2109.07267 [pdf, other]
Title: JUBILEE: Secure Debt Relief and Forgiveness
Subjects: Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT); General Economics (econ.GN)

JUBILEE is a securely computed mechanism for debt relief and forgiveness in a frictionless manner without involving trusted third parties, leading to more harmonious debt settlements by incentivising the parties to truthfully reveal their private information. JUBILEE improves over all previous methods:
- individually rational, incentive-compatible, truthful/strategy-proof, ex-post efficient, optimal mechanism for debt relief and forgiveness with private information
- by the novel introduction of secure computation techniques to debt relief, the "blessing of the debtor" is hereby granted for the first time: debt settlements with higher expected profits and a higher probability of success than without using secure computation
A simple and practical implementation is included for "The Secure Spreadsheet". Another implementation is realised using Raziel smart contracts on a blockchain with Pravuil consensus.

[166]  arXiv:2109.07269 [pdf, other]
Title: Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy
Subjects: Cryptography and Security (cs.CR)

With local differential privacy (LDP), users can privatize their data and thus guarantee privacy properties before transmitting it to the server (a.k.a. the aggregator). One primary objective of LDP is frequency (or histogram) estimation, in which the aggregator estimates the number of users for each possible value. In practice, when a study with rich content on a population is desired, the interest is in the multiple attributes of the population, that is to say, in multidimensional data ($d \geq 2$). However, contrary to the problem of frequency estimation of a single attribute (the majority of the works), the multidimensional aspect imposes to pay particular attention to the privacy budget. This one can indeed grow extremely quickly due to the composition theorem. To the authors' knowledge, two solutions seem to stand out for this task: 1) splitting the privacy budget for each attribute, i.e., send each value with $\frac{\epsilon}{d}$-LDP (Spl), and 2) random sampling a single attribute and spend all the privacy budget to send it with $\epsilon$-LDP (Smp). Although Smp adds additional sampling error, it has proven to provide higher data utility than the former Spl solution. However, we argue that aggregators (who are also seen as attackers) are aware of the sampled attribute and its LDP value, which is protected by a "less strict" $e^{\epsilon}$ probability bound (rather than $e^{\epsilon/d}$). This way, we propose a solution named Random Sampling plus Fake Data (RS+FD), which allows creating uncertainty over the sampled attribute by generating fake data for each non-sampled attribute; RS+FD further benefits from amplification by sampling. We theoretically and experimentally validate our proposed solution on both synthetic and real-world datasets to show that RS+FD achieves nearly the same or better utility than the state-of-the-art Smp solution.

[167]  arXiv:2109.07270 [pdf, other]
Title: Distract Your Attention: Multi-head Cross Attention Network for Facial Expression Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a novel facial expression recognition network, called Distract your Attention Network (DAN). Our method is based on two key observations. Firstly, multiple classes share inherently similar underlying facial appearance, and their differences could be subtle. Secondly, facial expressions exhibit themselves through multiple facial regions simultaneously, and the recognition requires a holistic approach by encoding high-order interactions among local features. To address these issues, we propose our DAN with three key components: Feature Clustering Network (FCN), Multi-head cross Attention Network (MAN), and Attention Fusion Network (AFN). The FCN extracts robust features by adopting a large-margin learning objective to maximize class separability. In addition, the MAN instantiates a number of attention heads to simultaneously attend to multiple facial areas and build attention maps on these regions. Further, the AFN distracts these attentions to multiple locations before fusing the attention maps to a comprehensive one. Extensive experiments on three public datasets (including AffectNet, RAF-DB, and SFEW 2.0) verified that the proposed method consistently achieves state-of-the-art facial expression recognition performance. Code will be made available at https://github.com/yaoing/DAN.

[168]  arXiv:2109.07273 [pdf, ps, other]
Title: NBcoded: network attack classifiers based on Encoder and Naive Bayes model for resource limited devices
Comments: It will be published in "Communications in Computer and Information Science" and presented in the 3rd Workshop of Machine Learning for Cybersecurity (MLCS)
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

In the recent years, cybersecurity has gained high relevance, converting the detection of attacks or intrusions into a key task. In fact, a small breach in a system, application, or network, can cause huge damage for the companies. However, when this attack detection encounters the Artificial Intelligence paradigm, it can be addressed using high-quality classifiers which often need high resource demands in terms of computation or memory usage. This situation has a high impact when the attack classifiers need to be used with limited resourced devices or without overloading the performance of the devices, as it happens for example in IoT devices, or in industrial systems. For overcoming this issue, NBcoded, a novel light attack classification tool is proposed in this work. NBcoded works in a pipeline combining the removal of noisy data properties of the encoders with the low resources and timing consuming obtained by the Naive Bayes classifier. This work compares three different NBcoded implementations based on three different Naive Bayes likelihood distribution assumptions (Gaussian, Complement and Bernoulli). Then, the best NBcoded is compared with state of the art classifiers like Multilayer Perceptron and Random Forest. Our implementation shows to be the best model reducing the impact of training time and disk usage, even if it is outperformed by the other two in terms of Accuracy and F1-score (~ 2%).

[169]  arXiv:2109.07275 [pdf, other]
Title: DROMO: Distributionally Robust Offline Model-based Policy Optimization
Comments: Under review of S.-T. Yau Award 2021 of Computer Science
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We consider the problem of offline reinforcement learning with model-based control, whose goal is to learn a dynamics model from the experience replay and obtain a pessimism-oriented agent under the learned model. Current model-based constraint includes explicit uncertainty penalty and implicit conservative regularization that pushes Q-values of out-of-distribution state-action pairs down and the in-distribution up. While the uncertainty estimation, on which the former relies on, can be loosely calibrated for complex dynamics, the latter performs slightly better. To extend the basic idea of regularization without uncertainty quantification, we propose distributionally robust offline model-based policy optimization (DROMO), which leverages the ideas in distributionally robust optimization to penalize a broader range of out-of-distribution state-action pairs beyond the standard empirical out-of-distribution Q-value minimization. We theoretically show that our method optimizes a lower bound on the ground-truth policy evaluation, and it can be incorporated into any existing policy gradient algorithms. We also analyze the theoretical properties of DROMO's linear and non-linear instantiations.

[170]  arXiv:2109.07276 [pdf, other]
Title: Sequence Length is a Domain: Length-based Overfitting in Transformer Models
Subjects: Computation and Language (cs.CL)

Transformer-based sequence-to-sequence architectures, while achieving state-of-the-art results on a large number of NLP tasks, can still suffer from overfitting during training. In practice, this is usually countered either by applying regularization methods (e.g. dropout, L2-regularization) or by providing huge amounts of training data. Additionally, Transformer and other architectures are known to struggle when generating very long sequences. For example, in machine translation, the neural-based systems perform worse on very long sequences when compared to the preceding phrase-based translation approaches (Koehn and Knowles, 2017).
We present results which suggest that the issue might also be in the mismatch between the length distributions of the training and validation data combined with the aforementioned tendency of the neural networks to overfit to the training data. We demonstrate on a simple string editing task and a machine translation task that the Transformer model performance drops significantly when facing sequences of length diverging from the length distribution in the training data. Additionally, we show that the observed drop in performance is due to the hypothesis length corresponding to the lengths seen by the model during training rather than the length of the input sequence.

[171]  arXiv:2109.07281 [pdf]
Title: Measuring and improving information systems agility through the balanced scorecard approach
Journal-ref: International Journal of Computer Science Issues (IJCSI), 2015, vol. 12, no 5, p. 58
Subjects: Software Engineering (cs.SE)

Facing an environment increasingly complex, uncertain and changing, even in crisis, organizations are driven to be agile in order to survive. Agility, at the core heart of business strategy, represents the ability to grow in a competitive environment of continuous and unpredictable changes with information systems perceived as one of its main enablers. In other words, to be agile, organizations must be able to rely on agile enterprise information systems/information technology (IT/IS). Since, the agility needs are not the same among stakeholders, the objective of this research is to develop a conceptual model for the achievement and assessment of IT/IS agility from balanced perspectives to support agile organizations. Several researches have indicated that the IT balanced scorecard (BSC) approach is an appropriate technique for evaluating IT performance. This paper provides a balanced-scorecard based framework to evaluate the IS agility through four perspectives: business contribution, user orientation, operation excellence and innovation and competitiveness. The proposed framework, called IS Agility BSC, propose a three layer structure for each of the four perspectives: mission, key success factors, and agility evaluation criteria. According to this conceptual model, enterprise information systems agility is measured according to 14 agility key success factors, over the four BSC Perspectives, using 42 agility evaluation criteria that are identified based on literature survey methodology. This paper explores agility in the broader context of the enterprise information systems. The findings will provide, for both researchers and practitioners, a practical approach for achieving and measuring IS agility performance to support organizations in attempt to become agile as a new condition of surviving in the new business world.

[172]  arXiv:2109.07285 [pdf, other]
Title: Take a deep breath. Benefits of neuroplasticity practices for software developers and computer workers in a family of experiments
Subjects: Software Engineering (cs.SE)

Context. Computer workers in general, and software developers specifically, are under a high amount of stress due to continuous deadlines and, often, over-commitment. Objective. This study investigates the effects of a neuroplasticity practice, a specific breathing practice, on the attention awareness, well-being, perceived productivity, and self-efficacy of computer workers. Method. We created a questionnaire mainly from existing, validated scales as entry and exit survey for data points for comparison before and after the intervention. The intervention was a 12-week program with a weekly live session that included a talk on a well-being topic and a facilitated group breathing session. During the intervention period, we solicited one daily journal note and one weekly well-being rating. We replicated the intervention in a similarly structured 8-week program. The data was analyzed using a Bayesian multi-level model for the quantitative part and thematic analysis for the qualitative part. Results. The intervention showed improvements in participants' experienced inner states despite an ongoing pandemic and intense outer circumstances for most. Over the course of the study, we found an improvement in the participants' ratings of how often they found themselves in good spirits as well as in a calm and relaxed state. We also aggregate a large number of deep inner reflections and growth processes that may not have surfaced for the participants without deliberate engagement in such a program. Conclusion. The data indicates usefulness and effectiveness of an intervention for computer workers in terms of increasing well-being and resilience. Everyone needs a way to deliberately relax, unplug, and recover. Breathing practice is a simple way to do so, and the results call for establishing a larger body of work to make this common practice.

[173]  arXiv:2109.07288 [pdf, other]
Title: Two algorithms for vehicular obstacle detection in sparse pointcloud
Subjects: Robotics (cs.RO)

One of the main components of an autonomous vehicle is the obstacle detection pipeline. Most prototypes, both from research and industry, rely on lidars for this task. Pointcloud information from lidar is usually combined with data from cameras and radars, but the backbone of the architecture is mainly based on 3D bounding boxes computed from lidar data. To retrieve an accurate representation, sensors with many planes, e.g., greater than 32 planes, are usually employed. The returned pointcloud is indeed dense and well defined, but high-resolution sensors are still expensive and often require powerful GPUs to be processed. Lidars with fewer planes are cheaper, but the returned data are not dense enough to be processed with state of the art deep learning approaches to retrieve 3D bounding boxes. In this paper, we propose two solutions based on occupancy grid and geometric refinement to retrieve a list of 3D bounding boxes employing lidar with a low number of planes (i.e., 16 and 8 planes). Our solutions have been validated on a custom acquired dataset with accurate ground truth to prove its feasibility and accuracy.

[174]  arXiv:2109.07293 [pdf, other]
Title: Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context
Comments: 10 pages, 4 figures, EMNLP 2021,code: this https URL
Subjects: Computation and Language (cs.CL)

Embedding based methods are widely used for unsupervised keyphrase extraction (UKE) tasks. Generally, these methods simply calculate similarities between phrase embeddings and document embedding, which is insufficient to capture different context for a more effective UKE model. In this paper, we propose a novel method for UKE, where local and global contexts are jointly modeled. From a global view, we calculate the similarity between a certain phrase and the whole document in the vector space as transitional embedding based models do. In terms of the local view, we first build a graph structure based on the document where phrases are regarded as vertices and the edges are similarities between vertices. Then, we proposed a new centrality computation method to capture local salient information based on the graph structure. Finally, we further combine the modeling of global and local context for ranking. We evaluate our models on three public benchmarks (Inspec, DUC 2001, SemEval 2010) and compare with existing state-of-the-art models. The results show that our model outperforms most models while generalizing better on input documents with different domains and length. Additional ablation study shows that both the local and global information is crucial for unsupervised keyphrase extraction tasks.

[175]  arXiv:2109.07295 [pdf, other]
Title: New Perspective on Progressive GANs Distillationfor One-class Novelty Detection
Comments: 11 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2007.06963
Subjects: Computer Vision and Pattern Recognition (cs.CV)

One-class novelty detection is conducted to iden-tify anomalous instances, with different distributions from theexpected normal instances. In this paper, the Generative Adver-sarial Network based on the Encoder-Decoder-Encoder scheme(EDE-GAN) achieves state-of-the-art performance. The two fac-tors bellow serve the above purpose: 1) The EDE-GAN calculatesthe distance between two latent vectors as the anomaly score,which is unlike the previous methods by utilizing the reconstruc-tion error between images. 2) The model obtains best resultswhen the batch size is set to 1. To illustrate their superiority,we design a new GAN architecture, and compareperformances according to different batch sizes. Moreover, withexperimentation leads to discovery, our result implies there is alsoevidence of just how beneficial constraint on the latent space arewhen engaging in model training.In an attempt to learn compact and fast models, we present anew technology, Progressive Knowledge Distillation with GANs(P-KDGAN), which connects two standard GANs through thedesigned distillation loss. Two-step progressive learning continu-ously augments the performance of student GANs with improvedresults over single-step approach. Our experimental results onCIFAR-10, MNIST, and FMNIST datasets illustrate that P-KDGAN improves the performance of the student GAN by2.44%, 1.77%, and 1.73% when compressing the computationat ratios of 24.45:1, 311.11:1, and 700:1, respectively.

[176]  arXiv:2109.07296 [pdf, other]
Title: Predicting Anti-Asian Hateful Users on Twitter during COVID-19
Comments: Accepted at Findings of EMNLP 2021. Please cite our EMNLP Findings 2021 paper!
Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)

We investigate predictors of anti-Asian hate among Twitter users throughout COVID-19. With the rise of xenophobia and polarization that has accompanied widespread social media usage in many nations, online hate has become a major social issue, attracting many researchers. Here, we apply natural language processing techniques to characterize social media users who began to post anti-Asian hate messages during COVID-19. We compare two user groups -- those who posted anti-Asian slurs and those who did not -- with respect to a rich set of features measured with data prior to COVID-19 and show that it is possible to predict who later publicly posted anti-Asian slurs. Our analysis of predictive features underlines the potential impact of news media and information sources that report on online hate and calls for further investigation into the role of polarized communication networks and news media.

[177]  arXiv:2109.07298 [pdf, other]
Title: FFAVOD: Feature Fusion Architecture for Video Object Detection
Comments: Accepted for publication in Pattern Recognition Letters
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A significant amount of redundancy exists between consecutive frames of a video. Object detectors typically produce detections for one image at a time, without any capabilities for taking advantage of this redundancy. Meanwhile, many applications for object detection work with videos, including intelligent transportation systems, advanced driver assistance systems and video surveillance. Our work aims at taking advantage of the similarity between video frames to produce better detections. We propose FFAVOD, standing for feature fusion architecture for video object detection. We first introduce a novel video object detection architecture that allows a network to share feature maps between nearby frames. Second, we propose a feature fusion module that learns to merge feature maps to enhance them. We show that using the proposed architecture and the fusion module can improve the performance of three base object detectors on two object detection benchmarks containing sequences of moving road users. Additionally, to further increase performance, we propose an improvement to the SpotNet attention module. Using our architecture on the improved SpotNet detector, we obtain the state-of-the-art performance on the UA-DETRAC public benchmark as well as on the UAVDT dataset. Code is available at https://github.com/hu64/FFAVOD.

[178]  arXiv:2109.07301 [pdf, other]
Title: What Vision-Language Models `See' when they See Scenes
Subjects: Computation and Language (cs.CL)

Images can be described in terms of the objects they contain, or in terms of the types of scene or place that they instantiate. In this paper we address to what extent pretrained Vision and Language models can learn to align descriptions of both types with images. We compare 3 state-of-the-art models, VisualBERT, LXMERT and CLIP. We find that (i) V&L models are susceptible to stylistic biases acquired during pretraining; (ii) only CLIP performs consistently well on both object- and scene-level descriptions. A follow-up ablation study shows that CLIP uses object-level information in the visual modality to align with scene-level textual descriptions.

[179]  arXiv:2109.07302 [pdf, ps, other]
Title: A Characterization of Individualization-Refinement Trees
Comments: to appear at ISAAC 2021
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

Individualization-Refinement (IR) algorithms form the standard method and currently the only practical method for symmetry computations of graphs and combinatorial objects in general. Through backtracking, on each graph an IR-algorithm implicitly creates an IR-tree whose order is the determining factor of the running time of the algorithm.
We give a precise and constructive characterization which trees are IR-trees. This characterization is applicable both when the tree is regarded as an uncolored object but also when regarded as a colored object where vertex colors stem from a node invariant. We also provide a construction that given a tree produces a corresponding graph whenever possible. This provides a constructive proof that our necessary conditions are also sufficient for the characterization.

[180]  arXiv:2109.07305 [pdf, other]
Title: Distributed flexibility as a cost-effective alternative to grid reinforcement
Subjects: Systems and Control (eess.SY)

The deployment of distributed photovoltaics (PV) in low-voltage networks may cause technical issues such as voltage rises, line ampacity violations, and transformer overloading for distribution system operators (DSOs). These problems may induce high grid reinforcement costs. In this work, we assume the DSO can control each prosumer's battery and PV system. Under such assumptions, we evaluate the cost of providing flexibility and compare it with grid reinforcement costs. Our results highlight that using distributed flexibility is more profitable than reinforcing a low-voltage network until the PV generation covers 145% of the network annual energy demand.

[181]  arXiv:2109.07306 [pdf, other]
Title: Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
Comments: EMNLP 2021
Subjects: Computation and Language (cs.CL)

Compared to monolingual models, cross-lingual models usually require a more expressive vocabulary to represent all languages adequately. We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity. To this end, we propose an algorithm VoCap to determine the desired vocabulary capacity of each language. However, increasing the vocabulary size significantly slows down the pre-training speed. In order to address the issues, we propose k-NN-based target sampling to accelerate the expensive softmax. Our experiments show that the multilingual vocabulary learned with VoCap benefits cross-lingual language model pre-training. Moreover, k-NN-based target sampling mitigates the side-effects of increasing the vocabulary size while achieving comparable performance and faster pre-training speed. The code and the pretrained multilingual vocabularies are available at https://github.com/bozheng-hit/VoCapXLM.

[182]  arXiv:2109.07307 [pdf, other]
Title: Expertise Affects Drone Racing Performance
Comments: 6 pages, 6 figures
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)

First-person view drone racing has become a popular televised sport. However, very little is known about the perceptual and motor skills of professional drone racing pilots. A better understanding of these skills may inform path planning and control algorithms for autonomous multirotor flight. By using a real-world drone racing track and a large-scale position tracking system, we compare the drone racing performance of five professional and five beginner pilots. Results show that professional pilots consistently outperform beginner pilots by achieving faster lap times, higher velocity, and more efficiently executing the challenging maneuvers. Trajectory analysis shows that experienced pilots choose more optimal racing lines than beginner pilots. Our results provide strong evidence for a contribution of expertise to performances in real-world human-piloted drone racing. We discuss the implications of these results for future work on autonomous fast and agile flight. We make our data openly available.

[183]  arXiv:2109.07308 [pdf]
Title: A Self-rescue Mechanism for an In-pipe Robot for Large Obstacle Negotiation in Water Distribution Systems
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Water distribution systems (WDS) carry potable water with millions of miles of pipelines and deliver purified water to residential areas. The incidents in the WDS cause leak and water loss, which imposes pressure gradient and public health crisis. Hence, utility managers need to assess the condition of pipelines periodically and localize the leak location (in case it is reported). In our previous works, we designed and developed a size-adaptable modular in-pipe robot [1] and controlled its motion in in-service WDS. However, due to the linearization of the dynamical equations of the robot, the stabilizer controller which is a linear quadratic regulator (LQR) cannot stabilize the large deviations of the stabilizing states due to the presence of obstacles that fails the robot during operation. To this aim, we design a self-rescue mechanism for the robot in which three auxiliary gear-motors retract and extend the arm modules with the designed controller towards a reliable motion in the negotiation of large obstacles and non-straight configurations. Simulation results show that the proposed mechanism along with the motion controller enables the robot to have an improved motion in pipelines.

[184]  arXiv:2109.07311 [pdf, other]
Title: MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake Detection
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The rapid progress in the ease of creating and spreading ultra-realistic media over social platforms calls for an urgent need to develop a generalizable deepfake detection technique. It has been observed that current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos. Inspired by this observation, in this paper, we present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation for classifying \textit{deepfakes}. MD-CSDNetwork is a novel cross-stitched network with two parallel branches carrying the spatial and frequency information, respectively. We hypothesize that these multi-domain input data streams can be considered as related supervisory signals. The supervision from both branches ensures better performance and generalization. Further, the concept of cross-stitch connections is utilized where they are inserted between the two branches to learn an optimal combination of domain-specific and shared representations from other domains automatically. Extensive experiments are conducted on the popular benchmark dataset namely FaceForeniscs++ for forgery classification. We report improvements over all the manipulation types in FaceForensics++ dataset and comparable results with state-of-the-art methods for cross-database evaluation on the Celeb-DF dataset and the Deepfake Detection Dataset.

[185]  arXiv:2109.07313 [pdf, other]
Title: Approximately EFX Allocations for Indivisible Chores
Comments: 13 pages, 1 figures
Subjects: Computer Science and Game Theory (cs.GT)

In this paper we study how to fairly allocate a set of m indivisible chores to a group of n agents, each of which has a general additive cost function on the items. Since envy-free (EF) allocation is not guaranteed to exist, we consider the notion of envy-freeness up to any item (EFX). In contrast to the fruitful results regarding the (approximation of) EFX allocations for goods, very little is known for the allocation of chores. Prior to our work, for the allocation of chores, it is known that EFX allocations always exist for two agents, or general number of agents with IDO cost functions. For general instances, no non-trivial approximation result regarding EFX allocation is known. In this paper we make some progress in this direction by showing that for three agents we can always compute a 5-approximation of EFX allocation in polynomial time. For n>=4 agents, our algorithm always computes an allocation that achieves an approximation ratio of O(n^2) regarding EFX.

[186]  arXiv:2109.07316 [pdf, other]
Title: Reinshard: An optimally sharded dual-blockchain for concurrency resolution
Comments: 14 pages, 9 figures, 3 tables
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Applications (stat.AP)

Decentralized control, low-complexity, flexible and efficient communications are the requirements of an architecture that aims to scale blockchains beyond the current state. Such properties are attainable by reducing ledger size and providing parallel operations in the blockchain. Sharding is one of the approaches that lower the burden of the nodes and enhance performance. However, the current solutions lack the features for resolving concurrency during cross-shard communications. With multiple participants belonging to different shards, handling concurrent operations is essential for optimal sharding. This issue becomes prominent due to the lack of architectural support and requires additional consensus for cross-shard communications. Inspired by hybrid Proof-of-Work/Proof-of-Stake (PoW/PoS), like Ethereum, hybrid consensus and 2-hop blockchain, we propose Reinshard, a new blockchain that inherits the properties of hybrid consensus for optimal sharding. Reinshard uses PoW and PoS chain-pairs with PoS sub-chains for all the valid chain-pairs where the hybrid consensus is attained through Verifiable Delay Function (VDF). Our architecture provides a secure method of arranging nodes in shards and resolves concurrency conflicts using the delay factor of VDF. The applicability of Reinshard is demonstrated through security and experimental evaluations. A practical concurrency problem is considered to show the efficacy of Reinshard in providing optimal sharding.

[187]  arXiv:2109.07319 [pdf, other]
Title: Embedding Convolutions for Short Text Extreme Classification with Millions of Labels
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Automatic annotation of short-text data to a large number of target labels, referred to as Short Text Extreme Classification, has recently found numerous applications in prediction of related searches and product recommendation tasks. The conventional usage of Convolutional Neural Network (CNN) to capture n-grams in text-classification relies heavily on uniformity in word-ordering and the presence of long input sequences to convolve over. However, this is missing in short and unstructured text sequences encountered in search and recommendation. In order to tackle this, we propose an orthogonal approach by recasting the convolution operation to capture coupled semantics along the embedding dimensions, and develop a word-order agnostic embedding enhancement module to deal with the lack of structure in such queries. Benefitting from the computational efficiency of the convolution operation, Embedding Convolutions, when applied on the enriched word embeddings, result in a light-weight and yet powerful encoder (InceptionXML) that is robust to the inherent lack of structure in short-text extreme classification.
Towards scaling our model to problems with millions of labels, we also propose InceptionXML+, which addresses the shortcomings of the dynamic hard-negative mining framework in the recently proposed LightXML by improving the alignment between the label-shortlister and extreme classifier. On popular benchmark datasets, we empirically demonstrate that the proposed method outperforms state-of-the-art deep extreme classifiers such as Astec by an average of 5% and 8% on the P@k and propensity-scored PSP@k metrics respectively.

[188]  arXiv:2109.07320 [pdf, ps, other]
Title: Error estimation and adaptivity for stochastic collocation finite elements Part I: single-level approximation
Comments: 20 pages; 9 figures
Subjects: Numerical Analysis (math.NA)

A general adaptive refinement strategy for solving linear elliptic partial differential equation with random data is proposed and analysed herein. The adaptive strategy extends the a posteriori error estimation framework introduced by Guignard and Nobile in 2018 (SIAM J. Numer. Anal., 56, 3121--3143) to cover problems with a nonaffine parametric coefficient dependence. A suboptimal, but nonetheless reliable and convenient implementation of the strategy involves approximation of the decoupled PDE problems with a common finite element approximation space. Computational results obtained using such a single-level strategy are presented in this paper (part I). Results obtained using a potentially more efficient multilevel approximation strategy, where meshes are individually tailored, will be discussed in part II of this work. The codes used to generate the numerical results are available online.

[189]  arXiv:2109.07321 [pdf, other]
Title: PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching
Comments: Technical report of the paper {\sf PoWareMatch}: a Quality-aware Deep Learning Approach to Improve Human Schema Matching, accepted to ACM Journal of Data and Information Quality (JDIQ), Special Issue on Deep Learning for Data Quality
Subjects: Databases (cs.DB); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Schema matching is a core task of any data integration process. Being investigated in the fields of databases, AI, Semantic Web and data mining for many years, the main challenge remains the ability to generate quality matches among data concepts (e.g., database attributes). In this work, we examine a novel angle on the behavior of humans as matchers, studying match creation as a process. We analyze the dynamics of common evaluation measures (precision, recall, and f-measure), with respect to this angle and highlight the need for unbiased matching to support this analysis. Unbiased matching, a newly defined concept that describes the common assumption that human decisions represent reliable assessments of schemata correspondences, is, however, not an inherent property of human matchers. In what follows, we design PoWareMatch that makes use of a deep learning mechanism to calibrate and filter human matching decisions adhering the quality of a match, which are then combined with algorithmic matching to generate better match results. We provide an empirical evidence, established based on an experiment with more than 200 human matchers over common benchmarks, that PoWareMatch predicts well the benefit of extending the match with an additional correspondence and generates high quality matches. In addition, PoWareMatch outperforms state-of-the-art matching algorithms.

[190]  arXiv:2109.07323 [pdf, other]
Title: FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining
Comments: Work in progress
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In this paper, we find that the spreadsheet formula, which performs calculations on numerical values in tables, is naturally a strong supervision of numerical reasoning. More importantly, large amounts of spreadsheets with expert-made formulae are available on the web and can be obtained easily. FORTAP is the first method for numerical-reasoning-aware table pretraining by leveraging large corpus of spreadsheet formulae. We design two formula pretraining tasks to explicitly guide FORTAP to learn numerical reference and calculation in semi-structured tables. FORTAP achieves state-of-the-art results on two representative downstream tasks, cell type classification and formula prediction, showing great potential of numerical-reasoning-aware pretraining.

[191]  arXiv:2109.07324 [pdf, other]
Title: PointManifoldCut: Point-wise Augmentation in the Manifold for Point Clouds
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Augmentation can benefit point cloud learning due to the limited availability of large-scale public datasets. This paper proposes a mix-up augmentation approach, PointManifoldCut, which replaces the neural network embedded points, rather than the Euclidean space coordinates. This approach takes the advantage that points at the higher levels of the neural network are already trained to embed its neighbors relations and mixing these representation will not mingle the relation between itself and its label. This allows to regularize the parameter space as the other augmentation methods but without worrying about the proper label of the replaced points. The experiments show that our proposed approach provides a competitive performance on point cloud classification and segmentation when it is combined with the cutting-edge vanilla point cloud networks. The result shows a consistent performance boosting compared to other state-of-the-art point cloud augmentation method, such as PointMixup and PointCutMix. The code of this paper is available at: https://github.com/fun0515/PointManifoldCut.

[192]  arXiv:2109.07334 [pdf, other]
Title: A Systematic Literature Review on Wearable Health Data Publishing under Differential Privacy
Comments: 25 pages
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

Wearable devices generate different types of physiological data about the individuals. These data can provide valuable insights for medical researchers and clinicians that cannot be availed through traditional measures. Researchers have historically relied on survey responses or observed behavior. Interestingly, physiological data can provide a richer amount of user cognition than that obtained from any other sources, including the user himself. Therefore, the inexpensive consumer-grade wearable devices have become a point of interest for the health researchers. In addition, they are also used in continuous remote health monitoring and sometimes by the insurance companies. However, the biggest concern for such kind of use cases is the privacy of the individuals. There are a few privacy mechanisms, such as abstraction and k-anonymity, are widely used in information systems. Recently, Differential Privacy (DP) has emerged as a proficient technique to publish privacy sensitive data, including data from wearable devices. In this paper, we have conducted a Systematic Literature Review (SLR) to identify, select and critically appraise researches in DP as well as to understand different techniques and exiting use of DP in wearable data publishing. Based on our study we have identified the limitations of proposed solutions and provided future directions.

[193]  arXiv:2109.07335 [pdf, other]
Title: Comparing decision mining approaches with regard to the meaningfulness of their results
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Decisions and the underlying rules are indispensable for driving process execution during runtime, i.e., for routing process instances at alternative branches based on the values of process data. Decision rules can comprise unary data conditions, e.g., age > 40, binary data conditions where the relation between two or more variables is relevant, e.g. temperature1 < temperature2, and more complex conditions that refer to, for example, parts of a medical image. Decision discovery aims at automatically deriving decision rules from process event logs. Existing approaches focus on the discovery of unary, or in some instances binary data conditions. The discovered decision rules are usually evaluated using accuracy, but not with regards to their semantics and meaningfulness, although this is crucial for validation and the subsequent implementation/adaptation of the decision rules. Hence, this paper compares three decision mining approaches, i.e., two existing ones and one newly described approach, with respect to the meaningfulness of their results. For comparison, we use one synthetic data set for a realistic manufacturing case and the two real-world BPIC 2017/2020 logs. The discovered rules are discussed with regards to their semantics and meaningfulness.

[194]  arXiv:2109.07339 [pdf, other]
Title: S3LAM: Structured Scene SLAM
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

We propose a new general SLAM system that uses the semantic segmentation of objects and structures in the scene. Semantic information is relevant as it contains high level information which may make SLAM more accurate and robust. Our contribution is threefold: i) A new SLAM system based on ORB-SLAM2 that creates a semantic map made of clusters of points corresponding to objects instances and structures in the scene. ii) A modification of the classical Bundle Adjustment formulation to constrain each cluster using geometrical priors, which improves both camera localization and reconstruction and enables a better understanding of the scene. iii) A new Bundle Adjustment formulation at the level of clusters to improve the convergence of classical Bundle Adjustment. We evaluate our approach on several sequences from a public dataset and show that, with respect to ORB-SLAM2 it improves camera pose estimation.

[195]  arXiv:2109.07342 [pdf]
Title: Sequential Point Cloud Prediction in Interactive Scenarios: A Survey
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Point cloud has been widely used in the field of autonomous driving since it can provide a more comprehensive three-dimensional representation of the environment than 2D images. Point-wise prediction based on point cloud sequence (PCS) is an essential part of environment understanding, which can assist in the decision-making and motion-planning of autonomous vehicles. However, PCS prediction has not been deeply researched in the literature. This paper proposes a brief review of the sequential point cloud prediction methods, focusing on interactive scenarios. Firstly, we define the PCS prediction problem and introduce commonly-used frameworks. Secondly, by reviewing non-predictive problems, we analyze and summarize the spatio-temporal feature extraction methods based on PCS. On this basis, we review two types of PCS prediction tasks, scene flow estimation (SFE) and point cloud location prediction (PCLP), highlighting their connections and differences. Finally, we discuss some opening issues and point out some potential research directions.

[196]  arXiv:2109.07343 [pdf, other]
Title: Miðeind's WMT 2021 submission
Subjects: Computation and Language (cs.CL)

We present Mi{\dh}eind's submission for the English$\to$Icelandic and Icelandic$\to$English subsets of the 2021 WMT news translation task. Transformer-base models are trained for translation on parallel data to generate backtranslations iteratively. A pretrained mBART-25 model is then adapted for translation using parallel data as well as the last backtranslation iteration. This adapted pretrained model is then used to re-generate backtranslations, and the training of the adapted model is continued.

[197]  arXiv:2109.07346 [pdf, other]
Title: Introducing an Abusive Language Classification Framework for Telegram to Investigate the German Hater Community
Subjects: Computation and Language (cs.CL)

Since traditional social media platforms ban more and more actors that distribute hate speech or other forms of abusive language (deplatforming), these actors migrate to alternative platforms that do not moderate the users' content. One known platform that is relevant for the German hater community is Telegram, for which there have only been made limited research efforts so far.
The goal of this study is to develop a broad framework that consists of (i) an abusive language classification model for German Telegram messages and (ii) a classification model for the hatefulness of Telegram channels. For the first part, we employ existing abusive language datasets containing posts from other platforms to build our classification models. For the channel classification model, we develop a method that combines channel specific content information coming from a topic model with a social graph to predict the hatefulness of channels. Furthermore, we complement these two approaches for hate speech detection with insightful results on the evolution of the hater community on Telegram in Germany. Moreover, we propose methods to the hate speech research community for scalable network analyses for social media platforms. As an additional output of the study, we release an annotated abusive language dataset containing 1,149 annotated Telegram messages.

[198]  arXiv:2109.07348 [pdf, other]
Title: Cross-lingual Transfer of Monolingual Models
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent studies in zero-shot cross-lingual learning using multilingual models have falsified the previous hypothesis that shared vocabulary and joint pre-training are the keys to cross-lingual generalization. Inspired by this advancement, we introduce a cross-lingual transfer method for monolingual models based on domain adaptation. We study the effects of such transfer from four different languages to English. Our experimental results on GLUE show that the transferred models outperform the native English model independently of the source language. After probing the English linguistic knowledge encoded in the representations before and after transfer, we find that semantic information is retained from the source language, while syntactic information is learned during transfer. Additionally, the results of evaluating the transferred models in source language tasks reveal that their performance in the source domain deteriorates after transfer.

[199]  arXiv:2109.07351 [pdf, other]
Title: The ELITR ECA Corpus
Subjects: Computation and Language (cs.CL)

We present the ELITR ECA corpus, a multilingual corpus derived from publications of the European Court of Auditors. We use automatic translation together with Bleualign to identify parallel sentence pairs in all 506 translation directions. The result is a corpus comprising 264k document pairs and 41.9M sentence pairs.

[200]  arXiv:2109.07353 [pdf, other]
Title: Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos
Comments: Accepted by IEEE Transactions on Image Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos. Different from traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal movement similarity between human joints in this video. Hence, they can effectively understand which joints are spatially closer and/or have consistent motion, for reducing depth ambiguity and/or motion uncertainty when lifting 2D pose to 3D pose. We conduct extensive experiments on three popular benchmarks, e.g., Human3.6M, HumanEva-I, and MPI-INF-3DHP, where DG-Net outperforms a number of recent SOTA approaches with fewer input frames and model size.

[201]  arXiv:2109.07359 [pdf, other]
Title: Modular Neural Ordinary Differential Equations
Comments: 4 pages
Subjects: Machine Learning (cs.LG)

The laws of physics have been written in the language of dif-ferential equations for centuries. Neural Ordinary Differen-tial Equations (NODEs) are a new machine learning architecture which allows these differential equations to be learned from a dataset. These have been applied to classical dynamics simulations in the form of Lagrangian Neural Net-works (LNNs) and Second Order Neural Differential Equations (SONODEs). However, they either cannot represent the most general equations of motion or lack interpretability. In this paper, we propose Modular Neural ODEs, where each force component is learned with separate modules. We show how physical priors can be easily incorporated into these models. Through a number of experiments, we demonstrate these result in better performance, are more interpretable, and add flexibility due to their modularity.

[202]  arXiv:2109.07364 [pdf, other]
Title: Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU
Comments: Accepted at EMNLP 2021
Subjects: Computation and Language (cs.CL)

Incremental processing allows interactive systems to respond based on partial inputs, which is a desirable property e.g. in dialogue agents. The currently popular Transformer architecture inherently processes sequences as a whole, abstracting away the notion of time. Recent work attempts to apply Transformers incrementally via restart-incrementality by repeatedly feeding, to an unchanged model, increasingly longer input prefixes to produce partial outputs. However, this approach is computationally costly and does not scale efficiently for long sequences. In parallel, we witness efforts to make Transformers more efficient, e.g. the Linear Transformer (LT) with a recurrence mechanism. In this work, we examine the feasibility of LT for incremental NLU in English. Our results show that the recurrent LT model has better incremental performance and faster inference speed compared to the standard Transformer and LT with restart-incrementality, at the cost of part of the non-incremental (full sequence) quality. We show that the performance drop can be mitigated by training the model to wait for right context before committing to an output and that training with input prefixes is beneficial for delivering correct partial outputs.

[203]  arXiv:2109.07365 [pdf, other]
Title: Maneuver-based Trajectory Prediction for Self-driving Cars Using Spatio-temporal Convolutional Networks
Comments: Accepted for IROS 2021
Subjects: Robotics (cs.RO)

The ability to predict the future movements of other vehicles is a subconscious and effortless skill for humans and key to safe autonomous driving. Therefore, trajectory prediction for autonomous cars has gained a lot of attention in recent years. It is, however, still a hard task to achieve human-level performance. Interdependencies between vehicle behaviors and the multimodal nature of future intentions in a dynamic and complex driving environment render trajectory prediction a challenging problem. In this work, we propose a new, data-driven approach for predicting the motion of vehicles in a road environment. The model allows for inferring future intentions from the past interaction among vehicles in highway driving scenarios. Using our neighborhood-based data representation, the proposed system jointly exploits correlations in the spatial and temporal domain using convolutional neural networks. Our system considers multiple possible maneuver intentions and their corresponding motion and predicts the trajectory for five seconds into the future. We implemented our approach and evaluated it on two highway datasets taken in different countries and are able to achieve a competitive prediction performance.

[204]  arXiv:2109.07368 [pdf, other]
Title: UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

This paper presents a unified end-to-end frame-work for both streaming and non-streamingspeech translation. While the training recipes for non-streaming speech translation have been mature, the recipes for streaming speechtranslation are yet to be built. In this work, wefocus on developing a unified model (UniST) which supports streaming and non-streaming ST from the perspective of fundamental components, including training objective, attention mechanism and decoding policy. Experiments on the most popular speech-to-text translation benchmark dataset, MuST-C, show that UniST achieves significant improvement for non-streaming ST, and a better-learned trade-off for BLEU score and latency metrics for streaming ST, compared with end-to-end baselines and the cascaded models. We will make our codes and evaluation tools publicly available.

[205]  arXiv:2109.07370 [pdf, other]
Title: Direct and Sparse Deformable Tracking
Comments: 8 pages, 5 figures, submitted to RAL with ICRA
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deformable Monocular SLAM algorithms recover the localization of a camera in an unknown deformable environment. Current approaches use a template-based deformable tracking to recover the camera pose and the deformation of the map. These template-based methods use an underlying global deformation model. In this paper, we introduce a novel deformable camera tracking method with a local deformation model for each point. Each map point is defined as a single textured surfel that moves independently of the other map points. Thanks to a direct photometric error cost function, we can track the position and orientation of the surfel without an explicit global deformation model. In our experiments, we validate the proposed system and observe that our local deformation model estimates more accurately and robustly the targeted deformations of the map in both laboratory-controlled experiments and in-body scenarios undergoing non-isometric deformations, with changing topology or discontinuities.

[206]  arXiv:2109.07371 [pdf, other]
Title: Self-learn to Explain Siamese Networks Robustly
Comments: Accepted to ICDM 2021
Subjects: Machine Learning (cs.LG)

Learning to compare two objects are essential in applications, such as digital forensics, face recognition, and brain network analysis, especially when labeled data is scarce and imbalanced. As these applications make high-stake decisions and involve societal values like fairness and transparency, it is critical to explain the learned models. We aim to study post-hoc explanations of Siamese networks (SN) widely used in learning to compare. We characterize the instability of gradient-based explanations due to the additional compared object in SN, in contrast to architectures with a single input instance. We propose an optimization framework that derives global invariance from unlabeled data using self-learning to promote the stability of local explanations tailored for specific query-reference pairs. The optimization problems can be solved using gradient descent-ascent (GDA) for constrained optimization, or SGD for KL-divergence regularized unconstrained optimization, with convergence proofs, especially when the objective functions are nonconvex due to the Siamese architecture. Quantitative results and case studies on tabular and graph data from neuroscience and chemical engineering show that the framework respects the self-learned invariance while robustly optimizing the faithfulness and simplicity of the explanation. We further demonstrate the convergence of GDA experimentally.

[207]  arXiv:2109.07373 [pdf, other]
Title: A Unified Framework for Biphasic Facial Age Translation with Noisy-Semantic Guided Generative Adversarial Networks
Comments: Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Biphasic facial age translation aims at predicting the appearance of the input face at any age. Facial age translation has received considerable research attention in the last decade due to its practical value in cross-age face recognition and various entertainment applications. However, most existing methods model age changes between holistic images, regardless of the human face structure and the age-changing patterns of individual facial components. Consequently, the lack of semantic supervision will cause infidelity of generated faces in detail. To this end, we propose a unified framework for biphasic facial age translation with noisy-semantic guided generative adversarial networks. Structurally, we project the class-aware noisy semantic layouts to soft latent maps for the following injection operation on the individual facial parts. In particular, we introduce two sub-networks, ProjectionNet and ConstraintNet. ProjectionNet introduces the low-level structural semantic information with noise map and produces soft latent maps. ConstraintNet disentangles the high-level spatial features to constrain the soft latent maps, which endows more age-related context into the soft latent maps. Specifically, attention mechanism is employed in ConstraintNet for feature disentanglement. Meanwhile, in order to mine the strongest mapping ability of the network, we embed two types of learning strategies in the training procedure, supervised self-driven generation and unsupervised condition-driven cycle-consistent generation. As a result, extensive experiments conducted on MORPH and CACD datasets demonstrate the prominent ability of our proposed method which achieves state-of-the-art performance.

[208]  arXiv:2109.07377 [pdf, other]
Title: Topic Transferable Table Question Answering
Comments: To appear at EMNLP 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Weakly-supervised table question-answering(TableQA) models have achieved state-of-art performance by using pre-trained BERT transformer to jointly encoding a question and a table to produce structured query for the question. However, in practical settings TableQA systems are deployed over table corpora having topic and word distributions quite distinct from BERT's pretraining corpus. In this work we simulate the practical topic shift scenario by designing novel challenge benchmarks WikiSQL-TS and WikiTQ-TS, consisting of train-dev-test splits in five distinct topic groups, based on the popular WikiSQL and WikiTableQuestions datasets. We empirically show that, despite pre-training on large open-domain text, performance of models degrades significantly when they are evaluated on unseen topics. In response, we propose T3QA (Topic Transferable Table Question Answering) a pragmatic adaptation framework for TableQA comprising of: (1) topic-specific vocabulary injection into BERT, (2) a novel text-to-text transformer generator (such as T5, GPT2) based natural language question generation pipeline focused on generating topic specific training data, and (3) a logical form reranker. We show that T3QA provides a reasonably good baseline for our topic shift benchmarks. We believe our topic split benchmarks will lead to robust TableQA solutions that are better suited for practical deployment.

[209]  arXiv:2109.07380 [pdf, other]
Title: DCUR: Data Curriculum for Teaching via Samples with Reinforcement Learning
Comments: Supplementary material is available at this https URL
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Deep reinforcement learning (RL) has shown great empirical successes, but suffers from brittleness and sample inefficiency. A potential remedy is to use a previously-trained policy as a source of supervision. In this work, we refer to these policies as teachers and study how to transfer their expertise to new student policies by focusing on data usage. We propose a framework, Data CUrriculum for Reinforcement learning (DCUR), which first trains teachers using online deep RL, and stores the logged environment interaction history. Then, students learn by running either offline RL or by using teacher data in combination with a small amount of self-generated data. DCUR's central idea involves defining a class of data curricula which, as a function of training time, limits the student to sampling from a fixed subset of the full teacher data. We test teachers and students using state-of-the-art deep RL algorithms across a variety of data curricula. Results suggest that the choice of data curricula significantly impacts student learning, and that it is beneficial to limit the data during early training stages while gradually letting the data availability grow over time. We identify when the student can learn offline and match teacher performance without relying on specialized offline RL algorithms. Furthermore, we show that collecting a small fraction of online data provides complementary benefits with the data curriculum. Supplementary material is available at https://tinyurl.com/teach-dcur.

[210]  arXiv:2109.07382 [pdf, other]
Title: Toward Modern Fortran Tooling and a Thriving Developer Community
Comments: Submitted to ACM Fortran Forum
Subjects: Programming Languages (cs.PL); Computers and Society (cs.CY)

Fortran is the oldest high-level programming language that remains in use today and is one of the dominant languages used for compute-intensive scientific and engineering applications. However, Fortran has not kept up with the modern software development practices and tooling in the internet era. As a consequence, the Fortran developer experience has diminished. Specifically, lack of a rich general-purpose library ecosystem, modern tools for building and packaging Fortran libraries and applications, and online learning resources, has made it difficult for Fortran to attract and retain new users. To address this problem, an open source community has formed on GitHub in 2019 and began to work on the initial set of core tools: a standard library, a build system and package manager, and a community-curated website for Fortran. In this paper we report on the progress to date and outline the next steps.

[211]  arXiv:2109.07383 [pdf, other]
Title: RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
Comments: Accepted to EMNLP 2021 Long Paper
Subjects: Computation and Language (cs.CL)

This paper addresses the efficiency challenge of Neural Architecture Search (NAS) by formulating the task as a ranking problem. Previous methods require numerous training examples to estimate the accurate performance of architectures, although the actual goal is to find the distinction between "good" and "bad" candidates. Here we do not resort to performance predictors. Instead, we propose a performance ranking method (RankNAS) via pairwise ranking. It enables efficient architecture search using much fewer training examples. Moreover, we develop an architecture selection method to prune the search space and concentrate on more promising candidates. Extensive experiments on machine translation and language modeling tasks show that RankNAS can design high-performance architectures while being orders of magnitude faster than state-of-the-art NAS systems.

[212]  arXiv:2109.07395 [pdf, other]
Title: Can one hear the shape of a neural network?: Snooping the GPU via Magnetic Side Channel
Comments: 14 pages, accepted to USENIX Security 2022
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Neural network applications have become popular in both enterprise and personal settings. Network solutions are tuned meticulously for each task, and designs that can robustly resolve queries end up in high demand. As the commercial value of accurate and performant machine learning models increases, so too does the demand to protect neural architectures as confidential investments. We explore the vulnerability of neural networks deployed as black boxes across accelerated hardware through electromagnetic side channels. We examine the magnetic flux emanating from a graphics processing unit's power cable, as acquired by a cheap $3 induction sensor, and find that this signal betrays the detailed topology and hyperparameters of a black-box neural network model. The attack acquires the magnetic signal for one query with unknown input values, but known input dimensions. The network reconstruction is possible due to the modular layer sequence in which deep neural networks are evaluated. We find that each layer component's evaluation produces an identifiable magnetic signal signature, from which layer topology, width, function type, and sequence order can be inferred using a suitably trained classifier and a joint consistency optimization based on integer programming. We study the extent to which network specifications can be recovered, and consider metrics for comparing network similarity. We demonstrate the potential accuracy of this side channel attack in recovering the details for a broad range of network architectures, including random designs. We consider applications that may exploit this novel side channel exposure, such as adversarial transfer attacks. In response, we discuss countermeasures to protect against our method and other similar snooping techniques.

[213]  arXiv:2109.07396 [pdf, other]
Title: Constraint based Knowledge Base Distillation in End-to-End Task Oriented Dialogs
Comments: D. Raghu and A. Jain contributed equally to this work
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

End-to-End task-oriented dialogue systems generate responses based on dialog history and an accompanying knowledge base (KB). Inferring those KB entities that are most relevant for an utterance is crucial for response generation. Existing state of the art scales to large KBs by softly filtering over irrelevant KB information. In this paper, we propose a novel filtering technique that consists of (1) a pairwise similarity based filter that identifies relevant information by respecting the n-ary structure in a KB record. and, (2) an auxiliary loss that helps in separating contextually unrelated KB information. We also propose a new metric -- multiset entity F1 which fixes a correctness issue in the existing entity F1 metric. Experimental results on three publicly available task-oriented dialog datasets show that our proposed approach outperforms existing state-of-the-art models.

[214]  arXiv:2109.07401 [pdf, other]
Title: Matching with Transformers in MELT
Comments: accepted at the Ontology Matching Workshop at the International Semantic Web Conference (ISWC 2021)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)

One of the strongest signals for automated matching of ontologies and knowledge graphs are the textual descriptions of the concepts. The methods that are typically applied (such as character- or token-based comparisons) are relatively simple, and therefore do not capture the actual meaning of the texts. With the rise of transformer-based language models, text comparison based on meaning (rather than lexical features) is possible. In this paper, we model the ontology matching task as classification problem and present approaches based on transformer models. We further provide an easy to use implementation in the MELT framework which is suited for ontology and knowledge graph matching. We show that a transformer-based filter helps to choose the correct correspondences given a high-recall alignment and already achieves a good result with simple alignment post-processing methods.

[215]  arXiv:2109.07402 [pdf, other]
Title: Multi View Spatial-Temporal Model for Travel Time Estimation
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

Taxi arrival time prediction is an essential part of building intelligent transportation systems. Traditional arrival time estimation methods mainly rely on traffic map feature extraction, which can not model complex situations and nonlinear spatial and temporal relationships. Therefore, we propose a Multi-View Spatial-Temporal Model (MVSTM) to capture the dependence of spatial-temporal and trajectory. Specifically, we use graph2vec to model the spatial view, dual-channel temporal module to model the trajectory view, and structural embedding to model the traffic semantics. Experiments on large-scale taxi trajectory data show that our approach is more effective than the novel method. The source code can be obtained from https://github.com/775269512/SIGSPATIAL-2021-GISCUP-4th-Solution.

[216]  arXiv:2109.07403 [pdf, other]
Title: BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text Classification
Comments: 12 pages with appendix, 7 figures
Subjects: Computation and Language (cs.CL)

Deep Neural Networks have taken Natural Language Processing by storm. While this led to incredible improvements across many tasks, it also initiated a new research field, questioning the robustness of these neural networks by attacking them. In this paper, we investigate four word substitution-based attacks on BERT. We combine a human evaluation of individual word substitutions and a probabilistic analysis to show that between 96% and 99% of the analyzed attacks do not preserve semantics, indicating that their success is mainly based on feeding poor data to the model. To further confirm that, we introduce an efficient data augmentation procedure and show that many adversarial examples can be prevented by including data similar to the attacks during training. An additional post-processing step reduces the success rates of state-of-the-art attacks below 5%. Finally, by looking at more reasonable thresholds on constraints for word substitutions, we conclude that BERT is a lot more robust than research on attacks suggests.

[217]  arXiv:2109.07407 [pdf, other]
Title: Semi-supervised Contrastive Learning for Label-efficient Medical Image Segmentation
Comments: 9 pages, accepted to MICCAI 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The success of deep learning methods in medical image segmentation tasks heavily depends on a large amount of labeled data to supervise the training. On the other hand, the annotation of biomedical images requires domain knowledge and can be laborious. Recently, contrastive learning has demonstrated great potential in learning latent representation of images even without any label. Existing works have explored its application to biomedical image segmentation where only a small portion of data is labeled, through a pre-training phase based on self-supervised contrastive learning without using any labels followed by a supervised fine-tuning phase on the labeled portion of data only. In this paper, we establish that by including the limited label in formation in the pre-training phase, it is possible to boost the performance of contrastive learning. We propose a supervised local contrastive loss that leverages limited pixel-wise annotation to force pixels with the same label to gather around in the embedding space. Such loss needs pixel-wise computation which can be expensive for large images, and we further propose two strategies, downsampling and block division, to address the issue. We evaluate our methods on two public biomedical image datasets of different modalities. With different amounts of labeled data, our methods consistently outperform the state-of-the-art contrast-based methods and other semi-supervised learning techniques.

[218]  arXiv:2109.07409 [pdf]
Title: Sporting the government: Twitter as a window into sportspersons' engagement with causes in India and USA
Comments: 22 pages, 18 images, 2 tables
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)

With the ubiquitous reach of social media, influencers are increasingly central to articulation of political agendas on a range of topics. We curate a sample of tweets from the 200 most followed sportspersons in India and the United States respectively since 2019, map their connections with politicians, and visualize their engagements with key topics online. We find significant differences between the ways in which Indian and US sportspersons engage with politics online-while leading Indian sportspersons tend to align closely with the ruling party and engage minimally in dissent, American sportspersons engage with a range of political issues and are willing to publicly criticize politicians or policy. Our findings suggest that the ownership and governmental control of sports impact public stances on issues that professional sportspersons are willing to engage in online. It might also be inferred, depending upon the government of the day, that the costs of speaking up against the state and the government in power have different socio-economic costs in the US and India.

[219]  arXiv:2109.07410 [pdf, other]
Title: Assisting the Human Fact-Checkers: Detecting All Previously Fact-Checked Claims in a Document
Comments: detecting previously fact-checked claims, fact-checking, disinformation, fake news, social media, political debates
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Given the recent proliferation of false claims online, there has been a lot of manual fact-checking effort. As this is very time-consuming, human fact-checkers can benefit from tools that can support them and make them more efficient. Here, we focus on building a system that could provide such support. Given an input document, it aims to detect all sentences that contain a claim that can be verified by some previously fact-checked claims (from a given database). The output is a re-ranked list of the document sentences, so that those that can be verified are ranked as high as possible, together with corresponding evidence. Unlike previous work, which has looked into claim retrieval, here we take a document-level perspective. We create a new manually annotated dataset for the task, and we propose suitable evaluation measures. We further experiment with a learning-to-rank approach, achieving sizable performance gains over several strong baselines. Our analysis demonstrates the importance of modeling text similarity and stance, while also taking into account the veracity of the retrieved previously fact-checked claims. We believe that this research would be of interest to fact-checkers, journalists, media, and regulatory authorities.

[220]  arXiv:2109.07411 [pdf, other]
Title: AliMe MKG: A Multi-modal Knowledge Graph for Live-streaming E-commerce
Comments: CIKM2021
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Live streaming is becoming an increasingly popular trend of sales in E-commerce. The core of live-streaming sales is to encourage customers to purchase in an online broadcasting room. To enable customers to better understand a product without jumping out, we propose AliMe MKG, a multi-modal knowledge graph that aims at providing a cognitive profile for products, through which customers are able to seek information about and understand a product. Based on the MKG, we build an online live assistant that highlights product search, product exhibition and question answering, allowing customers to skim over item list, view item details, and ask item-related questions. Our system has been launched online in the Taobao app, and currently serves hundreds of thousands of customers per day.

[221]  arXiv:2109.07419 [pdf, other]
Title: Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Comments: This paper is accepted to PACT 2021
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

To meet the extreme compute demands for deep learning across commercial and scientific applications, dataflow accelerators are becoming increasingly popular. While these "domain-specific" accelerators are not fully programmable like CPUs and GPUs, they retain varying levels of flexibility with respect to data orchestration, i.e., dataflow and tiling optimizations to enhance efficiency. There are several challenges when designing new algorithms and mapping approaches to execute the algorithms for a target problem on new hardware. Previous works have addressed these challenges individually. To address this challenge as a whole, in this work, we present a HW-SW co-design ecosystem for spatial accelerators called Union within the popular MLIR compiler infrastructure. Our framework allows exploring different algorithms and their mappings on several accelerator cost models. Union also includes a plug-and-play library of accelerator cost models and mappers which can easily be extended. The algorithms and accelerator cost models are connected via a novel mapping abstraction that captures the map space of spatial accelerators which can be systematically pruned based on constraints from the hardware, workload, and mapper. We demonstrate the value of Union for the community with several case studies which examine offloading different tensor operations(CONV/GEMM/Tensor Contraction) on diverse accelerator architectures using different mapping schemes.

[222]  arXiv:2109.07424 [pdf, other]
Title: SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations
Comments: short paper, EMNLP 2021, Findings
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

While contrastive learning is proven to be an effective training strategy in computer vision, Natural Language Processing (NLP) is only recently adopting it as a self-supervised alternative to Masked Language Modeling (MLM) for improving sequence representations. This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures, for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system's capability of pulling together similar samples (e.g., anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCLSeq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERTbase, including 6% absolute improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STSB. We also show consistent gains over self supervised contrastively learned representations, especially in non-semantic tasks. Finally we show that these gains are not solely due to augmentation, but rather to a downstream optimized sequence representation. Code: https://github.com/hooman650/SupCL-Seq

[223]  arXiv:2109.07428 [pdf, other]
Title: A Wide-area, Low-latency, and Power-efficient 6-DoF Pose Tracking System for Rigid Objects
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP); Systems and Control (eess.SY)

Position sensitive detectors (PSDs) offer possibility to track single active marker's two (or three) degrees of freedom (DoF) position with a high accuracy, while having a fast response time with high update frequency and low latency, all using a very simple signal processing circuit. However they are not particularly suitable for 6-DoF object pose tracking system due to lack of orientation measurement, limited tracking range, and sensitivity to environmental variation. We propose a novel 6-DoF pose tracking system for a rigid object tracking requiring a single active marker. The proposed system uses a stereo-based PSD pair and multiple Inertial Measurement Units (IMUs). This is done based on a practical approach to identify and control the power of Infrared-Light Emitting Diode (IR-LED) active markers, with an aim to increase the tracking work space and reduce the power consumption. Our proposed tracking system is validated with three different work space sizes and for static and dynamic positional accuracy using robotic arm manipulator with three different dynamic motion patterns. The results show that the static position root-mean-square (RMS) error is 0.6mm. The dynamic position RMS error is 0.7-0.9mm. The orientation RMS error is between 0.04 and 0.9 degree at varied dynamic motion. Overall, our proposed tracking system is capable of tracking a rigid object pose with sub-millimeter accuracy at the mid range of the work space and sub-degree accuracy for all work space under a lab setting.

[224]  arXiv:2109.07429 [pdf, other]
Title: Towards a Game-Theoretic Security Analysis of Off-Chain Protocols
Subjects: Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT)

Off-chain protocols constitute one of the most promising approaches to solve the inherent scalability issue of blockchain technologies. The core idea is to let parties transact on-chain only once to establish a channel between them, leveraging later on the resulting channel paths to perform arbitrarily many peer-to-peer transactions off-chain. While significant progress has been made in terms of proof techniques for off-chain protocols, existing approaches do not capture the game-theoretic incentives at the core of their design, which led to overlooking significant attack vectors like the Wormhole attack in the past.
This work introduces the first game-theoretic model that is expressive enough to reason about the security of off-chain protocols. We advocate the use of Extensive Form Games - EFGs and introduce two instances of EFGs to capture security properties of the closing and the routing of the Lightning Network. Specifically, we model the closing protocol, which relies on punishment mechanisms to disincentivize the uploading on-chain of old channel states, as well as the routing protocol, thereby formally characterizing the Wormhole attack, a vulnerability that undermines the fee-based incentive mechanism underlying the Lightning Network.

[225]  arXiv:2109.07431 [pdf, other]
Title: Contact-Aware Retargeting of Skinned Motion
Comments: International Conference on Computer Vision (ICCV)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

This paper introduces a motion retargeting method that preserves self-contacts and prevents interpenetration. Self-contacts, such as when hands touch each other or the torso or the head, are important attributes of human body language and dynamics, yet existing methods do not model or preserve these contacts. Likewise, interpenetration, such as a hand passing into the torso, are a typical artifact of motion estimation methods. The input to our method is a human motion sequence and a target skeleton and character geometry. The method identifies self-contacts and ground contacts in the input motion, and optimizes the motion to apply to the output skeleton, while preserving these contacts and reducing interpenetration. We introduce a novel geometry-conditioned recurrent network with an encoder-space optimization strategy that achieves efficient retargeting while satisfying contact constraints. In experiments, our results quantitatively outperform previous methods and we conduct a user study where our retargeted motions are rated as higher-quality than those produced by recent works. We also show our method generalizes to motion estimated from human videos where we improve over previous works that produce noticeable interpenetration.

[226]  arXiv:2109.07433 [pdf, ps, other]
Title: Encoding and Decoding with Partitioned Complementary Sequences for Low-PAPR OFDM
Authors: Alphan Sahin
Comments: 13 pages, to appear in IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

In this study, we propose partitioned complementary sequences (CSs) where the gaps between the clusters encode information bits to achieve low peak-to-average-power ratio (PAPR) orthogonal frequency division multiplexing (OFDM) symbols. We show that the partitioning rule without losing the feature of being a CS coincides with the non-squashing partitions of a positive integer and leads to a symmetric separation of clusters. We analytically derive the number of partitioned CSs for given bandwidth and a minimum distance constraint and obtain the corresponding recursive methods for enumerating the values of separations. We show that partitioning can increase the spectral efficiency (SE) without changing the alphabet of the nonzero elements of the CS, i.e., standard CSs relying on Reed-Muller (RM) code. We also develop an encoder for partitioned CSs and a maximum-likelihood-based recursive decoder for additive white Gaussian noise (AWGN) and fading channels. Our results indicate that the partitioned CSs under a minimum distance constraint can perform similar to the standard CSs in terms of average block error rate (BLER) and provide a higher SE at the expense of a limited signal-to-noise ratio (SNR) loss.

[227]  arXiv:2109.07434 [pdf, other]
Title: Discriminative and Generative Transformer-based Models For Situation Entity Classification
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We re-examine the situation entity (SE) classification task with varying amounts of available training data. We exploit a Transformer-based variational autoencoder to encode sentences into a lower dimensional latent space, which is used to generate the text and learn a SE classifier. Test set and cross-genre evaluations show that when training data is plentiful, the proposed model can improve over the previous discriminative state-of-the-art models. Our approach performs disproportionately better with smaller amounts of training data, but when faced with extremely small sets (4 instances per label), generative RNN methods outperform transformers. Our work provides guidance for future efforts on SE and semantic prediction tasks, and low-label training regimes.

[228]  arXiv:2109.07436 [pdf, other]
Title: Synthesizing Policies That Account For Human Execution Errors Caused By StateAliasing In Markov Decision Processes
Comments: 7 page paper, 6 pages supplemental material
Subjects: Artificial Intelligence (cs.AI)

When humans are given a policy to execute, there can be pol-icy execution errors and deviations in execution if there is un-certainty in identifying a state. So an algorithm that computesa policy for a human to execute ought to consider these effectsin its computations. An optimal MDP policy that is poorly ex-ecuted (because of a human agent) maybe much worse thananother policy that is executed with fewer errors. In this pa-per, we consider the problems of erroneous execution and ex-ecution delay when computing policies for a human agent thatwould act in a setting modeled by a Markov Decision Process(MDP). We present a framework to model the likelihood ofpolicy execution errors and likelihood of non-policy actionslike inaction (delays) due to state uncertainty. This is followedby a hill climbing algorithm to search for good policies thataccount for these errors. We then use the best policy found byhill climbing with a branch and bound algorithm to find theoptimal policy. We show experimental results in a Gridworlddomain and analyze the performance of the two algorithms.We also present human studies that verify if our assumptionson policy execution by humans under state-aliasing are rea-sonable.

[229]  arXiv:2109.07437 [pdf, other]
Title: Should We Be Pre-training? An Argument for End-task Aware Training as an Alternative
Comments: 16 pages, 4 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Pre-training, where models are trained on an auxiliary objective with abundant data before being fine-tuned on data from the downstream task, is now the dominant paradigm in NLP. In general, the pre-training step relies on little to no direct knowledge of the task on which the model will be fine-tuned, even when the end-task is known in advance. Our work challenges this status-quo of end-task agnostic pre-training. First, on three different low-resource NLP tasks from two domains, we demonstrate that multi-tasking the end-task and auxiliary objectives results in significantly better downstream task performance than the widely-used task-agnostic continued pre-training paradigm of Gururangan et al. (2020). We next introduce an online meta-learning algorithm that learns a set of multi-task weights to better balance among our multiple auxiliary objectives, achieving further improvements on end task performance and data efficiency.

[230]  arXiv:2109.07438 [pdf, other]
Title: CAMul: Calibrated and Accurate Multi-view Time-Series Forecasting
Comments: 16 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Probabilistic time-series forecasting enables reliable decision making across many domains. Most forecasting problems have diverse sources of data containing multiple modalities and structures. Leveraging information as well as uncertainty from these data sources for well-calibrated and accurate forecasts is an important challenging problem. Most previous work on multi-modal learning and forecasting simply aggregate intermediate representations from each data view by simple methods of summation or concatenation and do not explicitly model uncertainty for each data-view. We propose a general probabilistic multi-view forecasting framework CAMul, that can learn representations and uncertainty from diverse data sources. It integrates the knowledge and uncertainty from each data view in a dynamic context-specific manner assigning more importance to useful views to model a well-calibrated forecast distribution. We use CAMul for multiple domains with varied sources and modalities and show that CAMul outperforms other state-of-art probabilistic forecasting models by over 25\% in accuracy and calibration.

[231]  arXiv:2109.07439 [pdf, other]
Title: Is "moby dick" a Whale or a Bird? Named Entities and Terminology in Speech Translation
Comments: Accepted at EMNLP2021
Subjects: Computation and Language (cs.CL)

Automatic translation systems are known to struggle with rare words. Among these, named entities (NEs) and domain-specific terms are crucial, since errors in their translation can lead to severe meaning distortions. Despite their importance, previous speech translation (ST) studies have neglected them, also due to the dearth of publicly available resources tailored to their specific evaluation. To fill this gap, we i) present the first systematic analysis of the behavior of state-of-the-art ST systems in translating NEs and terminology, and ii) release NEuRoparl-ST, a novel benchmark built from European Parliament speeches annotated with NEs and terminology. Our experiments on the three language directions covered by our benchmark (en->es/fr/it) show that ST systems correctly translate 75-80% of terms and 65-70% of NEs, with very low performance (37-40%) on person names.

[232]  arXiv:2109.07440 [pdf, other]
Title: Private Attacks in Longest Chain Proof-of-stake Protocols with Single Secret Leader Elections
Comments: To appear in the proceedings of the 3rd ACM Conference on Advances in Financial Technologies
Subjects: Cryptography and Security (cs.CR)

Single Secret Leader Elections have recently been proposed as an improved leader election mechanism for proof-of-stake (PoS) blockchains. However, the security gain they provide has not been quantified. In this work, we present a comparison of PoS longest-chain protocols that are based on Single Secret Leader Elections (SSLE) - that elect exactly one leader per round - versus those based on Probabilistic Leader Elections (PLE) - where one leader is elected on expectation. Our analysis shows that when considering the private attack - the worst attack on longest-chain protocols - the security gained from using SSLE is substantial: the settlement time is decreased by roughly 25% for a 33% or 25% adversary. Furthermore, when considering grinding attacks, we find that the security threshold is increased by 10% (from 0.26 in the PLE case to 0.36 inthe SSLE case) and the settlement time is decreased by roughly 70% for a 20% adversary in the SSLE case.

[233]  arXiv:2109.07441 [pdf, other]
Title: DPGen: Automated Program Synthesis for Differential Privacy
Comments: CCS'21
Subjects: Cryptography and Security (cs.CR); Programming Languages (cs.PL)

Differential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (non-private) algorithm. The designer then decides where to add noise, and how much of it to add. This can be a non-trivial process -- if not done carefully, the algorithm might either violate differential privacy or have low utility.
In this paper, we present DPGen, a program synthesizer that takes in non-private code (without any noise) and automatically synthesizes its differentially private version (with carefully calibrated noise). Under the hood, DPGen uses novel algorithms to automatically generate a sketch program with candidate locations for noise, and then optimize privacy proof and noise scales simultaneously on the sketch program. Moreover, DPGen can synthesize sophisticated mechanisms that adaptively process queries until a specified privacy budget is exhausted. When evaluated on standard benchmarks, DPGen is able to generate differentially private mechanisms that optimize simple utility functions within 120 seconds. It is also powerful enough to synthesize adaptive privacy mechanisms.

[234]  arXiv:2109.07445 [pdf, other]
Title: Challenges in Detoxifying Language Models
Comments: 23 pages, 6 figures, published in Findings of EMNLP 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Large language models (LM) generate remarkably fluent text and can be efficiently adapted across NLP tasks. Measuring and guaranteeing the quality of generated text in terms of safety is imperative for deploying LMs in the real world; to this end, prior work often relies on automatic evaluation of LM toxicity. We critically discuss this approach, evaluate several toxicity mitigation strategies with respect to both automatic and human evaluation, and analyze consequences of toxicity mitigation in terms of model bias and LM quality. We demonstrate that while basic intervention strategies can effectively optimize previously established automatic metrics on the RealToxicityPrompts dataset, this comes at the cost of reduced LM coverage for both texts about, and dialects of, marginalized groups. Additionally, we find that human raters often disagree with high automatic toxicity scores after strong toxicity reduction interventions -- highlighting further the nuances involved in careful evaluation of LM toxicity.

[235]  arXiv:2109.07446 [pdf, other]
Title: When Does Translation Require Context? A Data-driven, Multilingual Exploration
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Although proper handling of discourse phenomena significantly contributes to the quality of machine translation (MT), common translation quality metrics do not adequately capture them. Recent works in context-aware MT attempt to target a small set of these phenomena during evaluation. In this paper, we propose a new metric, P-CXMI, which allows us to identify translations that require context systematically and confirm the difficulty of previously studied phenomena as well as uncover new ones that have not been addressed in previous work. We then develop the Multilingual Discourse-Aware (MuDA) benchmark, a series of taggers for these phenomena in 14 different language pairs, which we use to evaluate context-aware MT. We find that state-of-the-art context-aware MT models find marginal improvements over context-agnostic models on our benchmark, which suggests current models do not handle these ambiguities effectively. We release code and data to invite the MT research community to increase efforts on context-aware translation on discourse phenomena and languages that are currently overlooked.

[236]  arXiv:2109.07448 [pdf, other]
Title: Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

In this paper, we aim at synthesizing a free-viewpoint video of an arbitrary human performance using sparse multi-view cameras. Recently, several works have addressed this problem by learning person-specific neural radiance fields (NeRF) to capture the appearance of a particular human. In parallel, some work proposed to use pixel-aligned features to generalize radiance fields to arbitrary new scenes and objects. Adopting such generalization approaches to humans, however, is highly challenging due to the heavy occlusions and dynamic articulations of body parts. To tackle this, we propose Neural Human Performer, a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture. Specifically, we first introduce a temporal transformer that aggregates tracked visual features based on the skeletal body motion over time. Moreover, a multi-view transformer is proposed to perform cross-attention between the temporally-fused features and the pixel-aligned features at each time step to integrate observations on the fly from multiple views. Experiments on the ZJU-MoCap and AIST datasets show that our method significantly outperforms recent generalizable NeRF methods on unseen identities and poses. The video results and code are available at https://youngjoongunc.github.io/nhp.

[237]  arXiv:2109.07449 [pdf, other]
Title: WikiGUM: Exhaustive Entity Linking for Wikification in 12 Genres
Subjects: Computation and Language (cs.CL)

Previous work on Entity Linking has focused on resources targeting non-nested proper named entity mentions, often in data from Wikipedia, i.e. Wikification. In this paper, we present and evaluate WikiGUM, a fully wikified dataset, covering all mentions of named entities, including their non-named and pronominal mentions, as well as mentions nested within other mentions. The dataset covers a broad range of 12 written and spoken genres, most of which have not been included in Entity Linking efforts to date, leading to poor performance by a pretrained SOTA system in our evaluation. The availability of a variety of other annotations for the same data also enables further research on entities in context.

[238]  arXiv:2109.07452 [pdf, other]
Title: Can Machines Read Coding Manuals Yet? -- A Benchmark for Building Better Language Models for Code Understanding
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Code understanding is an increasingly important application of Artificial Intelligence. A fundamental aspect of understanding code is understanding text about code, e.g., documentation and forum discussions. Pre-trained language models (e.g., BERT) are a popular approach for various NLP tasks, and there are now a variety of benchmarks, such as GLUE, to help improve the development of such models for natural language understanding. However, little is known about how well such models work on textual artifacts about code, and we are unaware of any systematic set of downstream tasks for such an evaluation. In this paper, we derive a set of benchmarks (BLANCA - Benchmarks for LANguage models on Coding Artifacts) that assess code understanding based on tasks such as predicting the best answer to a question in a forum post, finding related forum posts, or predicting classes related in a hierarchy from class documentation. We evaluate the performance of current state-of-the-art language models on these tasks and show that there is a significant improvement on each task from fine tuning. We also show that multi-task training over BLANCA tasks helps build better language models for code understanding.

[239]  arXiv:2109.07455 [pdf, other]
Title: Deep Bregman Divergence for Contrastive Learning of Visual Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Deep Bregman divergence measures divergence of data points using neural networks which is beyond Euclidean distance and capable of capturing divergence over distributions. In this paper, we propose deep Bregman divergences for contrastive learning of visual representation and we aim to enhance contrastive loss used in self-supervised learning by training additional networks based on functional Bregman divergence. In contrast to the conventional contrastive learning methods which are solely based on divergences between single points, our framework can capture the divergence between distributions which improves the quality of learned representation. By combining conventional contrastive loss with the proposed divergence loss, our method outperforms baseline and most of previous methods for self-supervised and semi-supervised learning on multiple classifications and object detection tasks and datasets. The source code of the method and of all the experiments are available at supplementary.

[240]  arXiv:2109.07458 [pdf, other]
Title: Comparing Text Representations: A Theory-Driven Approach
Journal-ref: Published in EMNLP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Much of the progress in contemporary NLP has come from learning representations, such as masked language model (MLM) contextual embeddings, that turn challenging problems into simple classification tasks. But how do we quantify and explain this effect? We adapt general tools from computational learning theory to fit the specific characteristics of text datasets and present a method to evaluate the compatibility between representations and tasks. Even though many tasks can be easily solved with simple bag-of-words (BOW) representations, BOW does poorly on hard natural language inference tasks. For one such task we find that BOW cannot distinguish between real and randomized labelings, while pre-trained MLM representations show 72x greater distinction between real and random labelings than BOW. This method provides a calibrated, quantitative measure of the difficulty of a classification-based NLP task, enabling comparisons between representations without requiring empirical evaluations that may be sensitive to initializations and hyperparameters. The method provides a fresh perspective on the patterns in a dataset and the alignment of those patterns with specific labels.

[241]  arXiv:2109.07459 [pdf, ps, other]
Title: Timely Updating with Intermittent Energy and Data for Multiple Sources over Erasure Channels
Comments: Appeared in the International Symposium on Wireless Communication Systems (ISWCS) 2021, special session on Age of Information
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

A status updating system is considered in which multiple data sources generate packets to be delivered to a destination through a shared energy harvesting sensor. Only one source's data, when available, can be transmitted by the sensor at a time, subject to energy availability. Transmissions are prune to erasures, and each successful transmission constitutes a status update for its corresponding source at the destination. The goal is to schedule source transmissions such that the collective long-term average age-of-information (AoI) is minimized. AoI is defined as the time elapsed since the latest successfully-received data has been generated at its source. To solve this problem, the case with a single source is first considered, with a focus on threshold waiting policies, in which the sensor attempts transmission only if the time until both energy and data are available grows above a certain threshold. The distribution of the AoI is fully characterized under such a policy. This is then used to analyze the performance of the multiple sources case under maximum-age-first scheduling, in which the sensor's resources are dedicated to the source with the maximum AoI at any given time. The achievable collective long-term average AoI is derived in closed-form. Multiple numerical evaluations are demonstrated to show how the optimal threshold value behaves as a function of the system parameters, and showcase the benefits of a threshold-based waiting policy with intermittent energy and data arrivals.

[242]  arXiv:2109.07460 [pdf, other]
Title: Efficient Domain Adaptation of Language Models via Adaptive Tokenization
Comments: 11 pages. SustaiNLP workshop at EMNLP 2021
Subjects: Computation and Language (cs.CL)

Contextual embedding-based language models trained on large data sets, such as BERT and RoBERTa, provide strong performance across a wide range of tasks and are ubiquitous in modern NLP. It has been observed that fine-tuning these models on tasks involving data from domains different from that on which they were pretrained can lead to suboptimal performance. Recent work has explored approaches to adapt pretrained language models to new domains by incorporating additional pretraining using domain-specific corpora and task data. We propose an alternative approach for transferring pretrained language models to new domains by adapting their tokenizers. We show that domain-specific subword sequences can be efficiently determined directly from divergences in the conditional token distributions of the base and domain-specific corpora. In datasets from four disparate domains, we find adaptive tokenization on a pretrained RoBERTa model provides >97% of the performance benefits of domain specific pretraining. Our approach produces smaller models and less training and inference time than other approaches using tokenizer augmentation. While adaptive tokenization incurs a 6% increase in model parameters in our experimentation, due to the introduction of 10k new domain-specific tokens, our approach, using 64 vCPUs, is 72x faster than further pretraining the language model on domain-specific corpora on 8 TPUs.

[243]  arXiv:2109.07461 [pdf, other]
Title: MPC-Friendly Commitments for Publicly Verifiable Covert Security
Comments: To appear at ACM CCS 2021
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

We address the problem of efficiently verifying a commitment in a two-party computation. This addresses the scenario where a party P1 commits to a value $x$ to be used in a subsequent secure computation with another party P2 that wants to receive assurance that P1 did not cheat, i.e. that $x$ was indeed the value inputted into the secure computation. Our constructions operate in the publicly verifiable covert (PVC) security model, which is a relaxation of the malicious model of MPC appropriate in settings where P1 faces a reputational harm if caught cheating.
We introduce the notion of PVC commitment scheme and indexed hash functions to build commitments schemes tailored to the PVC framework, and propose constructions for both arithmetic and Boolean circuits that result in very efficient circuits. From a practical standpoint, our constructions for Boolean circuits are $60\times$ faster to evaluate securely, and use $36\times$ less communication than baseline methods based on hashing. Moreover, we show that our constructions are tight in terms of required non-linear operations, by proving lower bounds on the nonlinear gate count of commitment verification circuits. Finally, we present a technique to amplify the security properties our constructions that allows to efficiently recover malicious guarantees with statistical security.

[244]  arXiv:2109.07464 [pdf, other]
Title: AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark
Subjects: Computation and Language (cs.CL)

Open Information Extraction (OIE) is the task of extracting facts from sentences in the form of relations and their corresponding arguments in schema-free manner. Intrinsic performance of OIE systems is difficult to measure due to the incompleteness of existing OIE benchmarks: the ground truth extractions do not group all acceptable surface realizations of the same fact that can be extracted from a sentence. To measure performance of OIE systems more realistically, it is necessary to manually annotate complete facts (i.e., clusters of all acceptable surface realizations of the same fact) from input sentences. We propose AnnIE: an interactive annotation platform that facilitates such challenging annotation tasks and supports creation of complete fact-oriented OIE evaluation benchmarks. AnnIE is modular and flexible in order to support different use case scenarios (i.e., benchmarks covering different types of facts). We use AnnIE to build two complete OIE benchmarks: one with verb-mediated facts and another with facts encompassing named entities. Finally, we evaluate several OIE systems on our complete benchmarks created with AnnIE. Our results suggest that existing incomplete benchmarks are overly lenient, and that OIE systems are not as robust as previously reported. We publicly release AnnIE under non-restrictive license.

[245]  arXiv:2109.07465 [pdf, other]
Title: On the Limits of Minimal Pairs in Contrastive Evaluation
Comments: BlackboxNLP 2021
Subjects: Computation and Language (cs.CL)

Minimal sentence pairs are frequently used to analyze the behavior of language models. It is often assumed that model behavior on contrastive pairs is predictive of model behavior at large. We argue that two conditions are necessary for this assumption to hold: First, a tested hypothesis should be well-motivated, since experiments show that contrastive evaluation can lead to false positives. Secondly, test data should be chosen such as to minimize distributional discrepancy between evaluation time and deployment time. For a good approximation of deployment-time decoding, we recommend that minimal pairs are created based on machine-generated text, as opposed to human-written references. We present a contrastive evaluation suite for English-German MT that implements this recommendation.

Cross-lists for Thu, 16 Sep 21

[246]  arXiv:2109.06909 (cross-list from eess.IV) [pdf, other]
Title: Hardware-aware Real-time Myocardial Segmentation Quality Control in Contrast Echocardiography
Comments: 4 pages, DAC'21 invited paper
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

Automatic myocardial segmentation of contrast echocardiography has shown great potential in the quantification of myocardial perfusion parameters. Segmentation quality control is an important step to ensure the accuracy of segmentation results for quality research as well as its clinical application. Usually, the segmentation quality control happens after the data acquisition. At the data acquisition time, the operator could not know the quality of the segmentation results. On-the-fly segmentation quality control could help the operator to adjust the ultrasound probe or retake data if the quality is unsatisfied, which can greatly reduce the effort of time-consuming manual correction. However, it is infeasible to deploy state-of-the-art DNN-based models because the segmentation module and quality control module must fit in the limited hardware resource on the ultrasound machine while satisfying strict latency constraints. In this paper, we propose a hardware-aware neural architecture search framework for automatic myocardial segmentation and quality control of contrast echocardiography. We explicitly incorporate the hardware latency as a regularization term into the loss function during training. The proposed method searches the best neural network architecture for the segmentation module and quality prediction module with strict latency.

[247]  arXiv:2109.06911 (cross-list from stat.ML) [pdf, other]
Title: Learning and Decision-Making with Data: Optimal Formulations and Phase Transitions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)

We study the problem of designing optimal learning and decision-making formulations when only historical data is available. Prior work typically commits to a particular class of data-driven formulation and subsequently tries to establish out-of-sample performance guarantees. We take here the opposite approach. We define first a sensible yard stick with which to measure the quality of any data-driven formulation and subsequently seek to find an optimal such formulation. Informally, any data-driven formulation can be seen to balance a measure of proximity of the estimated cost to the actual cost while guaranteeing a level of out-of-sample performance. Given an acceptable level of out-of-sample performance, we construct explicitly a data-driven formulation that is uniformly closer to the true cost than any other formulation enjoying the same out-of-sample performance. We show the existence of three distinct out-of-sample performance regimes (a superexponential regime, an exponential regime and a subexponential regime) between which the nature of the optimal data-driven formulation experiences a phase transition. The optimal data-driven formulations can be interpreted as a classically robust formulation in the superexponential regime, an entropic distributionally robust formulation in the exponential regime and finally a variance penalized formulation in the subexponential regime. This final observation unveils a surprising connection between these three, at first glance seemingly unrelated, data-driven formulations which until now remained hidden.

[248]  arXiv:2109.06912 (cross-list from eess.AS) [pdf, other]
Title: fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit
Comments: Accepted to EMNLP 2021 Demo
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

This paper presents fairseq S^2, a fairseq extension for speech synthesis. We implement a number of autoregressive (AR) and non-AR text-to-speech models, and their multi-speaker variants. To enable training speech synthesis models with less curated data, a number of preprocessing tools are built and their importance is shown empirically. To facilitate faster iteration of development and analysis, a suite of automatic metrics is included. Apart from the features added specifically for this extension, fairseq S^2 also benefits from the scalability offered by fairseq and can be easily integrated with other state-of-the-art systems provided in this framework. The code, documentation, and pre-trained models are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_synthesis.

[249]  arXiv:2109.06915 (cross-list from math.PR) [pdf, ps, other]
Title: Reconstruction on Trees and Low-Degree Polynomials
Comments: 20 pages, comments welcome
Subjects: Probability (math.PR); Machine Learning (cs.LG); Statistics Theory (math.ST)

The study of Markov processes and broadcasting on trees has deep connections to a variety of areas including statistical physics, phylogenetic reconstruction, MCMC algorithms, and community detection in random graphs. Notably, the celebrated Belief Propagation (BP) algorithm achieves Bayes-optimal performance for the reconstruction problem of predicting the value of the Markov process at the root of the tree from its values at the leaves.
Recently, the analysis of low-degree polynomials has emerged as a valuable tool for predicting computational-to-statistical gaps. In this work, we investigate the performance of low-degree polynomials for the reconstruction problem on trees. Perhaps surprisingly, we show that there are simple tree models with $N$ leaves where (1) nontrivial reconstruction of the root value is possible with a simple polynomial time algorithm and with robustness to noise, but not with any polynomial of degree $N^{c}$ for $c > 0$ a constant, and (2) when the tree is unknown and given multiple samples with correlated root assignments, nontrivial reconstruction of the root value is possible with a simple, noise-robust, and computationally efficient SQ (Statistical Query) algorithm but not with any polynomial of degree $N^c$. These results clarify some of the limitations of low-degree polynomials vs. polynomial time algorithms for Bayesian estimation problems. They also complement recent work of Moitra, Mossel, and Sandon who studied the circuit complexity of Belief Propagation. We pose related open questions about low-degree polynomials and the Kesten-Stigum threshold.

[250]  arXiv:2109.06917 (cross-list from quant-ph) [pdf, other]
Title: Open Problems Related to Quantum Query Complexity
Authors: Scott Aaronson
Comments: 11 pages, to appear in ACM Transactions on Quantum Computing
Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC)

I offer a case that quantum query complexity still has loads of enticing and fundamental open problems -- from relativized QMA versus QCMA and BQP versus IP, to time/space tradeoffs for collision and element distinctness, to polynomial degree versus quantum query complexity for partial functions, to the Unitary Synthesis Problem and more.

[251]  arXiv:2109.06949 (cross-list from stat.ML) [pdf, other]
Title: Targeted Cross-Validation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In many applications, we have access to the complete dataset but are only interested in the prediction of a particular region of predictor variables. A standard approach is to find the globally best modeling method from a set of candidate methods. However, it is perhaps rare in reality that one candidate method is uniformly better than the others. A natural approach for this scenario is to apply a weighted $L_2$ loss in performance assessment to reflect the region-specific interest. We propose a targeted cross-validation (TCV) to select models or procedures based on a general weighted $L_2$ loss. We show that the TCV is consistent in selecting the best performing candidate under the weighted $L_2$ loss. Experimental studies are used to demonstrate the use of TCV and its potential advantage over the global CV or the approach of using only local data for modeling a local region.
Previous investigations on CV have relied on the condition that when the sample size is large enough, the ranking of two candidates stays the same. However, in many applications with the setup of changing data-generating processes or highly adaptive modeling methods, the relative performance of the methods is not static as the sample size varies. Even with a fixed data-generating process, it is possible that the ranking of two methods switches infinitely many times. In this work, we broaden the concept of the selection consistency by allowing the best candidate to switch as the sample size varies, and then establish the consistency of the TCV. This flexible framework can be applied to high-dimensional and complex machine learning scenarios where the relative performances of modeling procedures are dynamic.

[252]  arXiv:2109.06972 (cross-list from eess.IV) [pdf, ps, other]
Title: Combining GEDI and Sentinel-2 for wall-to-wall mapping of tall and short crops
Authors: Stefania Di Tommaso (1), Sherrie Wang (1,2 and 3), David B. Lobell (1) ((1) Department of Earth System Science and Center on Food Security and the Environment, Stanford University, (2) Institute for Computational and Mathematical Engineering, Stanford University, (3) Goldman School of Public Policy, University of California, Berkeley)
Subjects: Image and Video Processing (eess.IV); Computational Engineering, Finance, and Science (cs.CE); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

High resolution crop type maps are an important tool for improving food security, and remote sensing is increasingly used to create such maps in regions that possess ground truth labels for model training. However, these labels are absent in many regions, and models trained in other regions on typical satellite features, such as those from optical sensors, often exhibit low performance when transferred. Here we explore the use of NASA's Global Ecosystem Dynamics Investigation (GEDI) spaceborne lidar instrument, combined with Sentinel-2 optical data, for crop type mapping. Using data from three major cropped regions (in China, France, and the United States) we first demonstrate that GEDI energy profiles are capable of reliably distinguishing maize, a crop typically above 2m in height, from crops like rice and soybean that are shorter. We further show that these GEDI profiles provide much more invariant features across geographies compared to spectral and phenological features detected by passive optical sensors. GEDI is able to distinguish maize from other crops within each region with accuracies higher than 84%, and able to transfer across regions with accuracies higher than 82% compared to 64% for transfer of optical features. Finally, we show that GEDI profiles can be used to generate training labels for models based on optical imagery from Sentinel-2, thereby enabling the creation of 10m wall-to-wall maps of tall versus short crops in label-scarce regions. As maize is the second most widely grown crop in the world and often the only tall crop grown within a landscape, we conclude that GEDI offers great promise for improving global crop type maps.

[253]  arXiv:2109.06996 (cross-list from math.OC) [pdf, ps, other]
Title: Scalable Average Consensus with Compressed Communications
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Systems and Control (eess.SY)

We propose a new decentralized average consensus algorithm with compressed communication that scales linearly with the network size n. We prove that the proposed method converges to the average of the initial values held locally by the agents of a network when agents are allowed to communicate with compressed messages. The proposed algorithm works for a broad class of compression operators (possibly biased), where agents interact over arbitrary static, undirected, and connected networks. We further present numerical experiments that confirm our theoretical results and illustrate the scalability and communication efficiency of our algorithm.

[254]  arXiv:2109.07005 (cross-list from q-fin.PM) [pdf, other]
Title: WaveCorr: Correlation-savvy Deep Reinforcement Learning for Portfolio Management
Subjects: Portfolio Management (q-fin.PM); Machine Learning (cs.LG)

The problem of portfolio management represents an important and challenging class of dynamic decision making problems, where rebalancing decisions need to be made over time with the consideration of many factors such as investors preferences, trading environments, and market conditions. In this paper, we present a new portfolio policy network architecture for deep reinforcement learning (DRL)that can exploit more effectively cross-asset dependency information and achieve better performance than state-of-the-art architectures. In particular, we introduce a new property, referred to as \textit{asset permutation invariance}, for portfolio policy networks that exploit multi-asset time series data, and design the first portfolio policy network, named WaveCorr, that preserves this invariance property when treating asset correlation information. At the core of our design is an innovative permutation invariant correlation processing layer. An extensive set of experiments are conducted using data from both Canadian (TSX) and American stock markets (S&P 500), and WaveCorr consistently outperforms other architectures with an impressive 3%-25% absolute improvement in terms of average annual return, and up to more than 200% relative improvement in average Sharpe ratio. We also measured an improvement of a factor of up to 5 in the stability of performance under random choices of initial asset ordering and weights. The stability of the network has been found as particularly valuable by our industrial partner.

[255]  arXiv:2109.07011 (cross-list from astro-ph.SR) [pdf, other]
Title: Testing Self-Organized Criticality Across the Main Sequence using Stellar Flares from TESS
Comments: 6 pages, 3 figures, Submitted to journal
Subjects: Solar and Stellar Astrophysics (astro-ph.SR); Earth and Planetary Astrophysics (astro-ph.EP); Machine Learning (cs.LG); Adaptation and Self-Organizing Systems (nlin.AO)

Stars produce explosive flares, which are believed to be powered by the release of energy stored in coronal magnetic field configurations. It has been shown that solar flares exhibit energy distributions typical of self-organized critical systems. This study applies a novel flare detection technique to data obtained by NASA's TESS mission and identifies $\sim10^6$ flaring events on $\sim10^5$ stars across spectral types. Our results suggest that magnetic reconnection events that maintain the topology of the magnetic field in a self-organized critical state are ubiquitous among stellar coronae.

[256]  arXiv:2109.07018 (cross-list from physics.comp-ph) [pdf, other]
Title: Non-linear Independent Dual System (NIDS) for Discretization-independent Surrogate Modeling over Complex Geometries
Subjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG)

Numerical solution of partial differential equations (PDEs) require expensive simulations, limiting their application in design optimization routines, model-based control, or solution of large-scale inverse problems. Existing Convolutional Neural Network-based frameworks for surrogate modeling require lossy pixelization and data-preprocessing, which is not suitable for realistic engineering applications. Therefore, we propose non-linear independent dual system (NIDS), which is a deep learning surrogate model for discretization-independent, continuous representation of PDE solutions, and can be used for prediction over domains with complex, variable geometries and mesh topologies. NIDS leverages implicit neural representations to develop a non-linear mapping between problem parameters and spatial coordinates to state predictions by combining evaluations of a case-wise parameter network and a point-wise spatial network in a linear output layer. The input features of the spatial network include physical coordinates augmented by a minimum distance function evaluation to implicitly encode the problem geometry. The form of the overall output layer induces a dual system, where each term in the map is non-linear and independent. Further, we propose a minimum distance function-driven weighted sum of NIDS models using a shared parameter network to enforce boundary conditions by construction under certain restrictions. The framework is applied to predict solutions around complex, parametrically-defined geometries on non-parametrically-defined meshes with solution obtained many orders of magnitude faster than the full order models. Test cases include a vehicle aerodynamics problem with complex geometry and data scarcity, enabled by a training method in which more cases are gradually added as training progresses.

[257]  arXiv:2109.07027 (cross-list from math.OC) [pdf, ps, other]
Title: Guaranteed Safe Spacecraft Docking with Control Barrier Functions
Comments: Submitted to IEEE Controls Systems Letters, under review
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper presents a strategy for control of a spacecraft docking with a non-maneuvering target in the presence of safety constraints and bounded disturbances. The presence of disturbances prevents convergence to a unique docking state, so in our formulation, docking is defined as occurring within a set constructed using prescribed tolerances. Safety is ensured via application of Robust Control Barrier Functions to render a designated safe set forward invariant for any allowable disturbance. However, this safety strategy necessarily presumes a worst-case disturbance, and thus restricts trajectories to a subset of the safe set when a worst-case disturbance is not present. The presented controller accounts for this restriction, and guarantees that the spacecraft both remains safe and achieves docking in finite time for any allowable disturbance. The controller is then validated in simulation for a spacecraft landing on an asteroid, and two spacecraft docking in low Earth orbit.

[258]  arXiv:2109.07029 (cross-list from eess.IV) [pdf, other]
Title: Seeking an Optimal Approach for Computer-Aided Pulmonary Embolism Detection
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Pulmonary embolism (PE) represents a thrombus ("blood clot"), usually originating from a lower extremity vein, that travels to the blood vessels in the lung, causing vascular obstruction and in some patients, death. This disorder is commonly diagnosed using CT pulmonary angiography (CTPA). Deep learning holds great promise for the computer-aided CTPA diagnosis (CAD) of PE. However, numerous competing methods for a given task in the deep learning literature exist, causing great confusion regarding the development of a CAD PE system. To address this confusion, we present a comprehensive analysis of competing deep learning methods applicable to PE diagnosis using CTPA at the both image and exam levels. At the image level, we compare convolutional neural networks (CNNs) with vision transformers, and contrast self-supervised learning (SSL) with supervised learning, followed by an evaluation of transfer learning compared with training from scratch. At the exam level, we focus on comparing conventional classification (CC) with multiple instance learning (MIL). Our extensive experiments consistently show: (1) transfer learning consistently boosts performance despite differences between natural images and CT scans, (2) transfer learning with SSL surpasses its supervised counterparts; (3) CNNs outperform vision transformers, which otherwise show satisfactory performance; and (4) CC is, surprisingly, superior to MIL. Compared with the state of the art, our optimal approach provides an AUC gain of 0.2\% and 1.05\% for image-level and exam-level, respectively.

[259]  arXiv:2109.07045 (cross-list from eess.IV) [pdf, ps, other]
Title: Uncertainty Quantification in Medical Image Segmentation with Multi-decoder U-Net
Comments: MICCAI_QUBIQ challenge, conference, Uncertainty qualification
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Accurate medical image segmentation is crucial for diagnosis and analysis. However, the models without calibrated uncertainty estimates might lead to errors in downstream analysis and exhibit low levels of robustness. Estimating the uncertainty in the measurement is vital to making definite, informed conclusions. Especially, it is difficult to make accurate predictions on ambiguous areas and focus boundaries for both models and radiologists, even harder to reach a consensus with multiple annotations. In this work, the uncertainty under these areas is studied, which introduces significant information with anatomical structure and is as important as segmentation performance. We exploit the medical image segmentation uncertainty quantification by measuring segmentation performance with multiple annotations in a supervised learning manner and propose a U-Net based architecture with multiple decoders, where the image representation is encoded with the same encoder, and segmentation referring to each annotation is estimated with multiple decoders. Nevertheless, a cross-loss function is proposed for bridging the gap between different branches. The proposed architecture is trained in an end-to-end manner and able to improve predictive uncertainty estimates. The model achieves comparable performance with fewer parameters to the integrated training model that ranked the runner-up in the MICCAI-QUBIQ 2020 challenge.

[260]  arXiv:2109.07075 (cross-list from math.OC) [pdf, other]
Title: Guarding a Target Set from a Single Attacker in the Euclidean Space
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper addresses a two-player target defense game in the $n$-dimensional Euclidean space where an attacker attempts to enter a closed convex target set while a defender strives to capture the attacker beforehand. We provide a complete and universal differential game-based solution which not only encompasses recent work associated with similar problems whose target sets have simple, low-dimensional geometric shapes, but can also address problems that involve nontrivial geometric shapes of high-dimensional target sets. The value functions of the game are derived in a semi-analytical form that includes a convex optimization problem. When the latter problem has a closed-form solution, one of the value functions is used to analytically construct the barrier surface that divides the state space of the game into the winning sets of players. For the case where the barrier surface has no analytical expression but the target set has a smooth boundary, the bijective map between the target boundary and the projection of the barrier surface is obtained. By using Hamilton-Jacobi-Isaacs equation, we verify that the proposed optimal state feedback strategies always constitute the game's unique saddle point whether or not the optimization problem has a closed-form solution. We illustrate our solutions via numerical simulations.

[261]  arXiv:2109.07099 (cross-list from physics.optics) [pdf]
Title: Self-powered InP Nanowire Photodetector for Single Photon Level Detection at Room Temperature
Subjects: Optics (physics.optics); Emerging Technologies (cs.ET); Applied Physics (physics.app-ph)

Highly sensitive photodetectors with single photon level detection is one of the key components to a range of emerging technologies, in particular the ever-growing field of optical communication, remote sensing, and quantum computing. Currently, most of the single-photon detection technologies require external biasing at high voltages and/or cooling to low temperatures, posing great limitations for wider applications. Here, we demonstrate InP nanowire array photodetectors that can achieve single-photon level light detection at room temperature without an external bias. We use top-down etched, heavily doped p-type InP nanowires and n-type AZO/ZnO carrier selective contact to form a radial p-n junction with a built-in electric field exceeding 3x10^5 V/cm at 0 V. The device exhibits broadband light sensitivity and can distinguish a single photon per pulse from the dark noise at 0 V, enabled by its design to realize near-ideal broadband absorption, extremely low dark current, and highly efficient charge carrier separation. Meanwhile, the bandwidth of the device reaches above 600 MHz with a timing jitter of 538 ps. The proposed device design provides a new pathway towards low-cost, high-sensitivity, self-powered photodetectors for numerous future applications.

[262]  arXiv:2109.07181 (cross-list from physics.soc-ph) [pdf, other]
Title: α-Indirect Control in Onion-like Networks
Subjects: Physics and Society (physics.soc-ph); Data Structures and Algorithms (cs.DS)

Tens of thousands of parent companies control millions of subsidiaries through long chains ofintermediary entities in global corporate networks. Conversely, tens of millions of entities aredirectly held by sole owners. We propose an algorithm for identification of ultimate controllingentities in such networks that unifies direct and indirect control and allows for continuousinterpolation between the two concepts via a factor damping long paths. By exploiting onion-likeproperties of ownership networks the algorithm scales linearly with the network size and handlescircular ownership by design. We apply it to the universe of 4.2 mln UK companies and 4 mln oftheir holders to understand the distribution of control in the country. Furthermore, we providethe first independent evaluation of the control identification results. We reveal that the proposed{\alpha}-ICON algorithm identifies more than 96% of beneficiary entities from the evaluation set andsupersedes the existing approaches reported in the literature. We refer the superiority of{\alpha}-ICONalgorithm to its ability to correctly identify the parents long away from their subsidiaries in thenetwork.

[263]  arXiv:2109.07208 (cross-list from eess.SP) [pdf, other]
Title: Channel Estimation Based on Machine Learning Paradigm for Spatial Modulation OFDM
Comments: 4 pages, 5 figures, 2021 IEEE 1st International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering MI-STA
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)

In this paper, deep neural network (DNN) is integrated with spatial modulation-orthogonal frequency division multiplexing (SM-OFDM) technique for end-to-end data detection over Rayleigh fading channel. This proposed system directly demodulates the received symbols, leaving the channel estimation done only implicitly. Furthermore, an ensemble network is also proposed for this system. Simulation results show that the proposed DNN detection scheme has a significant advantage over classical methods when the pilot overhead and cyclic prefix (CP) are reduced, owing to its ability to learn and adjust to complicated channel conditions. Finally, the ensemble network is shown to improve the generalization of the proposed scheme, while also showing a slight improvement in its performance.

[264]  arXiv:2109.07211 (cross-list from q-fin.RM) [pdf, ps, other]
Title: Risk Measurement, Risk Entropy, and Autonomous Driving Risk Modeling
Authors: Jiamin Yu
Comments: 11 pages, 5 figures, IME 2021
Subjects: Risk Management (q-fin.RM); Machine Learning (cs.LG)

It has been for a long time to use big data of autonomous vehicles for perception, prediction, planning, and control of driving. Naturally, it is increasingly questioned why not using this big data for risk management and actuarial modeling. This article examines the emerging technical difficulties, new ideas, and methods of risk modeling under autonomous driving scenarios. Compared with the traditional risk model, the novel model is more consistent with the real road traffic and driving safety performance. More importantly, it provides technical feasibility for realizing risk assessment and car insurance pricing under a computer simulation environment.

[265]  arXiv:2109.07259 (cross-list from nlin.AO) [pdf, other]
Title: Evolutionary Reinforcement Learning Dynamics with Irreducible Environmental Uncertainty
Comments: 14 pages, 7 figures
Subjects: Adaptation and Self-Organizing Systems (nlin.AO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Physics and Society (physics.soc-ph)

In this work we derive and present evolutionary reinforcement learning dynamics in which the agents are irreducibly uncertain about the current state of the environment. We evaluate the dynamics across different classes of partially observable agent-environment systems and find that irreducible environmental uncertainty can lead to better learning outcomes faster, stabilize the learning process and overcome social dilemmas. However, as expected, we do also find that partial observability may cause worse learning outcomes, for example, in the form of a catastrophic limit cycle. Compared to fully observant agents, learning with irreducible environmental uncertainty often requires more exploration and less weight on future rewards to obtain the best learning outcomes. Furthermore, we find a range of dynamical effects induced by partial observability, e.g., a critical slowing down of the learning processes between reward regimes and the separation of the learning dynamics into fast and slow directions. The presented dynamics are a practical tool for researchers in biology, social science and machine learning to systematically investigate the evolutionary effects of environmental uncertainty.

[266]  arXiv:2109.07274 (cross-list from eess.AS) [pdf, ps, other]
Title: Binaural rendering from microphone array signals of arbitrary geometry
Comments: The following article has been accepted by Journal of the Acoustical Society of America (JASA). After it is published, it will be found at this http URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

A method of binaural rendering from microphone array signals of arbitrary geometry is proposed. To reproduce binaural signals from microphone array recordings at a remote location, a spherical microphone array is generally used for capturing a soundfield. However, owing to the lack of flexibility in the microphone arrangement, the single spherical array is sometimes impractical for estimating a large region of a soundfield. We propose a method based on harmonic analysis of infinite order, which allows the use of arbitrarily placed microphones. In the synthesis of the estimated soundfield, a spherical-wave-decomposition-based binaural rendering is also formulated to take into consideration the distance in measuring head-related transfer functions. We develop and evaluate a composite microphone array consisting of multiple small arrays. Experimental results including those of listening tests indicate that our proposed method is robust against change in listening position in the recording area.

[267]  arXiv:2109.07277 (cross-list from physics.ins-det) [pdf, other]
Title: Photon detection probability prediction using one-dimensional generative neural network
Subjects: Instrumentation and Detectors (physics.ins-det); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex)

Photon detection is important for liquid argon detectors for direct dark matter searches or neutrino property measurements. Precise simulation of photon transport is widely used to understand the probability of photon detection in liquid argon detectors. Traditional photon transport simulation, which tracks every photon using theGeant4simulation toolkit, is a major computational challenge for kilo-tonne-scale liquid argon detectors and GeV-level energy depositions. In this work, we propose a one-dimensional generative model which efficiently generates features using an OuterProduct-layer. This model bypasses photon transport simulation and predicts the number of photons detected by particular photon detectors at the same level of detail as theGeant4simulation. The application to simulating photon detection systems in kilo-tonne-scale liquid argon detectors demonstrates this novel generative model is able to reproduceGeant4simulation with good accuracy and 20 to 50 times faster. This generative model can be used to quickly predict photon detection probability in huge liquid argon detectors like ProtoDUNE or DUNE.

[268]  arXiv:2109.07282 (cross-list from quant-ph) [pdf, ps, other]
Title: Universality for Sets of Three-Valued Qubit (qutrit) Gates
Comments: 13 pages, 23 figures, as presented on the GOL2021 conferece (this https URL)
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

How to find universal sets quantum gates (gates whose composition can form any othergate within a given range) is an important part of the development of quantum computation science that has been explored in the past with success. However, there has not been much development in extending this very same theory to a generalization of qubits known as quregisters or as we call them here qunits (quantum units of information analogous to qubits that use any natural number n of basis states instead of 2 as qubits do). In this paper we will first do a review of the theory behind some essential proofs of quantum gate universality for qubit gates. After that we will show a new way of extending those statements to arbitrary qutrit gates in an analogous manner. We also mention how could this be extended to any qunit gate.

[269]  arXiv:2109.07284 (cross-list from cond-mat.mtrl-sci) [pdf, other]
Title: Quantitative reconstruction of defects in multi-layered bonded composites using fully convolutional network-based ultrasonic inversion
Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)

Ultrasonic methods have great potential applications to detect and characterize defects in multi-layered bonded composites. However, it remains challenging to quantitatively reconstruct defects, such as disbonds and kissing bonds, that influence the integrity of adhesive bonds and seriously reduce the strength of assemblies. In this work, an ultrasonic method based on the supervised fully convolutional network (FCN) is proposed to quantitatively reconstruct defects hidden in multi-layered bonded composites. In the training process of this method, an FCN establishes a non-linear mapping from measured ultrasonic data to the corresponding velocity models of multi-layered bonded composites. In the predicting process, the trained network obtained from the training process is used to directly reconstruct the velocity models from the new measured ultrasonic data of adhesively bonded composites. The presented FCN-based inversion method can automatically extract useful features in multi-layered composites. Although this method is computationally expensive in the training process, the prediction itself in the online phase takes only seconds. The numerical results show that the FCN-based ultrasonic inversion method is capable to accurately reconstruct ultrasonic velocity models of the high contrast defects, which has great potential for online detection of adhesively bonded composites.

[270]  arXiv:2109.07322 (cross-list from eess.IV) [pdf, ps, other]
Title: DeFungi: Direct Mycological Examination of Microscopic Fungi Images
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Traditionally, diagnosis and treatment of fungal infections in humans depend heavily on face-to-face consultations or examinations made by specialized laboratory scientists known as mycologists. In many cases, such as the recent mucormycosis spread in the COVID-19 pandemic, an initial treatment can be safely suggested to the patient during the earliest stage of the mycological diagnostic process by performing a direct examination of biopsies or samples through a microscope. Computer-aided diagnosis systems using deep learning models have been trained and used for the late mycological diagnostic stages. However, there are no reference literature works made for the early stages. A mycological laboratory in Colombia donated the images used for the development of this research work. They were manually labelled into five classes and curated with a subject matter expert assistance. The images were later cropped and patched with automated code routines to produce the final dataset. This paper presents experimental results classifying five fungi types using two different deep learning approaches and three different convolutional neural network models, VGG16, Inception V3, and ResNet50. The first approach benchmarks the classification performance for the models trained from scratch, while the second approach benchmarks the classification performance using pre-trained models based on the ImageNet dataset. Using k-fold cross-validation testing on the 5-class dataset, the best performing model trained from scratch was Inception V3, reporting 73.2% accuracy. Also, the best performing model using transfer learning was VGG16 reporting 85.04%. The statistics provided by the two approaches create an initial point of reference to encourage future research works to improve classification performance. Furthermore, the dataset built is published in Kaggle and GitHub to foster future research.

[271]  arXiv:2109.07327 (cross-list from eess.AS) [pdf, ps, other]
Title: Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning
Comments: INTERSPEECH2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Recently self-supervised learning has emerged as an effective approach to improve the performance of automatic speech recognition (ASR). Under such a framework, the neural network is usually pre-trained with massive unlabeled data and then fine-tuned with limited labeled data. However, the non-streaming architecture like bidirectional transformer is usually adopted by the neural network to achieve competitive results, which can not be used in streaming scenarios. In this paper, we mainly focus on improving the performance of streaming transformer under the self-supervised learning framework. Specifically, we propose a novel two-stage training method during fine-tuning, which combines knowledge distilling and self-training. The proposed training method achieves 16.3% relative word error rate (WER) reduction on Librispeech noisy test set. Finally, by only using the 100h clean subset of Librispeech as the labeled data and the rest (860h) as the unlabeled data, our streaming transformer based model obtains competitive WERs 3.5/8.7 on Librispeech clean/noisy test sets.

[272]  arXiv:2109.07340 (cross-list from stat.ML) [pdf, other]
Title: Distribution-free Contextual Dynamic Pricing
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

Contextual dynamic pricing aims to set personalized prices based on sequential interactions with customers. At each time period, a customer who is interested in purchasing a product comes to the platform. The customer's valuation for the product is a linear function of contexts, including product and customer features, plus some random market noise. The seller does not observe the customer's true valuation, but instead needs to learn the valuation by leveraging contextual information and historical binary purchase feedbacks. Existing models typically assume full or partial knowledge of the random noise distribution. In this paper, we consider contextual dynamic pricing with unknown random noise in the valuation model. Our distribution-free pricing policy learns both the contextual function and the market noise simultaneously. A key ingredient of our method is a novel perturbed linear bandit framework, where a modified linear upper confidence bound algorithm is proposed to balance the exploration of market noise and the exploitation of the current knowledge for better pricing. We establish the regret upper bound and a matching lower bound of our policy in the perturbed linear bandit framework and prove a sub-linear regret bound in the considered pricing problem. Finally, we demonstrate the superior performance of our policy on simulations and a real-life auto-loan dataset.

[273]  arXiv:2109.07344 (cross-list from physics.geo-ph) [pdf, other]
Title: The potential of self-supervised networks for random noise suppression in seismic data
Subjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG)

Noise suppression is an essential step in any seismic processing workflow. A portion of this noise, particularly in land datasets, presents itself as random noise. In recent years, neural networks have been successfully used to denoise seismic data in a supervised fashion. However, supervised learning always comes with the often unachievable requirement of having noisy-clean data pairs for training. Using blind-spot networks, we redefine the denoising task as a self-supervised procedure where the network uses the surrounding noisy samples to estimate the noise-free value of a central sample. Based on the assumption that noise is statistically independent between samples, the network struggles to predict the noise component of the sample due to its randomnicity, whilst the signal component is accurately predicted due to its spatio-temporal coherency. Illustrated on synthetic examples, the blind-spot network is shown to be an efficient denoiser of seismic data contaminated by random noise with minimal damage to the signal; therefore, providing improvements in both the image domain and down-the-line tasks, such as inversion. To conclude the study, the suggested approach is applied to field data and the results are compared with two commonly used random denoising techniques: FX-deconvolution and Curvelet transform. By demonstrating that blind-spot networks are an efficient suppressor of random noise, we believe this is just the beginning of utilising self-supervised learning in seismic applications.

[274]  arXiv:2109.07349 (cross-list from eess.AS) [pdf, other]
Title: Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-supervised Learning
Comments: INTERSPEECH2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Recently, self-supervised pre-training has gained success in automatic speech recognition (ASR). However, considering the difference between speech accents in real scenarios, how to identify accents and use accent features to improve ASR is still challenging. In this paper, we employ the self-supervised pre-training method for both accent identification and accented speech recognition tasks. For the former task, a standard deviation constraint loss (SDC-loss) based end-to-end (E2E) architecture is proposed to identify accents under the same language. As for accented speech recognition task, we design an accent-dependent ASR system, which can utilize additional accent input features. Furthermore, we propose a frame-level accent feature, which is extracted based on the proposed accent identification model and can be dynamically adjusted. We pre-train our models using 960 hours unlabeled LibriSpeech dataset and fine-tune them on AESRC2020 speech dataset. The experimental results show that our proposed accent-dependent ASR system is significantly ahead of the AESRC2020 baseline and achieves $6.5\%$ relative word error rate (WER) reduction compared with our accent-independent ASR system.

[275]  arXiv:2109.07361 (cross-list from physics.flu-dyn) [pdf, other]
Title: A hybrid phase field method for fluid-structure interactions in viscous fluids
Authors: Qi Hong, Qi Wang
Subjects: Fluid Dynamics (physics.flu-dyn); Numerical Analysis (math.NA)

We present a novel computational modeling framework to numerically investigate fluid-structure interaction in viscous fluids using the phase field embedding method. Each rigid body or elastic structure immersed in the incompressible viscous fluid matrix, grossly referred to as the particle in this paper, is identified by a volume preserving phase field. The motion of the particle is driven by the fluid velocity in the matrix for passive particles or combined with its self-propelling velocity for active particles. The excluded volume effect between a pair of particles or between a particle and the boundary is modeled by a repulsive potential force. The drag exerted to the fluid by a particle is assumed proportional to its velocity. When the particle is rigid, its state is described by a zero velocity gradient tensor within the nonzero phase field that defines its profile and a constraining stress exists therein. While the particle is elastic, a linear constitutive equation for the elastic stress is provided within the particle domain. A hybrid, thermodynamically consistent hydrodynamic model valid in the entire computational domain is then derived for the fluid-particle ensemble using the generalized Onsager principle accounting for both rigid and elastic particles. Structure-preserving numerical algorithms are subsequently developed for the thermodynamically consistent model. Numerical tests in 2D and 3D space are carried out to verify the rate of convergence and numerical examples are given to demonstrate the usefulness of the computational framework for simulating fluid-structure interactions for passive as well as self-propelling active particles in a viscous fluid matrix.

[276]  arXiv:2109.07384 (cross-list from stat.ML) [pdf, other]
Title: How to use KL-divergence to construct conjugate priors, with well-defined non-informative limits, for the multivariate Gaussian
Authors: Niko Brümmer
Comments: 10 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

The Wishart distribution is the standard conjugate prior for the precision of the multivariate Gaussian likelihood, when the mean is known -- while the normal-Wishart can be used when the mean is also unknown. It is however not so obvious how to assign values to the hyperparameters of these distributions. In particular, when forming non-informative limits of these distributions, the shape (or degrees of freedom) parameter of the Wishart must be handled with care. The intuitive solution of directly interpreting the shape as a pseudocount and letting it go to zero, as proposed by some authors, violates the restrictions on the shape parameter. We show how to use the scaled KL-divergence between multivariate Gaussians as an energy function to construct Wishart and normal-Wishart conjugate priors. When used as informative priors, the salient feature of these distributions is the mode, while the KL scaling factor serves as the pseudocount. The scale factor can be taken down to the limit at zero, to form non-informative priors that do not violate the restrictions on the Wishart shape parameter. This limit is non-informative in the sense that the posterior mode is identical to the maximum likelihood estimate of the Gaussian likelihood parameters.

[277]  arXiv:2109.07397 (cross-list from astro-ph.EP) [pdf, other]
Title: An Improved Approach to Orbital Determination and Prediction of Near-Earth Asteroids: Computer Simulation, Modeling and Test Measurements
Subjects: Earth and Planetary Astrophysics (astro-ph.EP); Instrumentation and Methods for Astrophysics (astro-ph.IM); Numerical Analysis (math.NA)

In this article, theory-based analytical methodologies of astrophysics employed in the modern era are suitably operated alongside a test research-grade telescope to image and determine the orbit of a near-earth asteroid from original observations, measurements, and calculations. Subsequently, its intrinsic orbital path has been calculated including the chance it would likely impact Earth in the time ahead. More so specifically, this case-study incorporates the most effective, feasible, and novel Gauss's Method in order to maneuver the orbital plane components of a planetesimal, further elaborating and extending our probes on a selected near-earth asteroid (namely the 12538-1998 OH) through the observational data acquired over a six week period. Utilizing the CCD (Charge Coupled Device) snapshots captured, we simulate and calculate the orbit of our asteroid as outlined in quite detailed explanations. The uncertainties and deviations from the expected values are derived to reach a judgement whether our empirical findings are truly reliable and representative measurements by partaking a statistical analysis based systematic approach. Concluding the study by narrating what could have caused such discrepancy of findings in the first place, if any, measures are put forward that could be undertaken to improve the test-case for future investigations. Following the calculation of orbital elements and their uncertainties using Monte Carlo analysis, simulations were executed with various sample celestial bodies to derive a plausible prediction regarding the fate of Asteroid 1998 OH. Finally, the astrometric and photometric data, after their precise verification, were officially submitted to the Minor Planet Center: an organization hosted by the Center for Astrophysics, Harvard and Smithsonian and funded by NASA, for keeping track of the asteroid's potential trajectories.

[278]  arXiv:2109.07399 (cross-list from physics.comp-ph) [pdf, other]
Title: Disentangling Generative Factors of Physical Fields Using Variational Autoencoders
Subjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG)

The ability to extract generative parameters from high-dimensional fields of data in an unsupervised manner is a highly desirable yet unrealized goal in computational physics. This work explores the use of variational autoencoders (VAEs) for non-linear dimension reduction with the aim of disentangling the low-dimensional latent variables to identify independent physical parameters that generated the data. A disentangled decomposition is interpretable and can be transferred to a variety of tasks including generative modeling, design optimization, and probabilistic reduced order modelling. A major emphasis of this work is to characterize disentanglement using VAEs while minimally modifying the classic VAE loss function (i.e. the ELBO) to maintain high reconstruction accuracy. Disentanglement is shown to be highly sensitive to rotations of the latent space, hyperparameters, random initializations and the learning schedule. The loss landscape is characterized by over-regularized local minima which surrounds desirable solutions. We illustrate comparisons between disentangled and entangled representations by juxtaposing learned latent distributions and the 'true' generative factors in a model porous flow problem. Implementing hierarchical priors (HP) is shown to better facilitate the learning of disentangled representations over the classic VAE. The choice of the prior distribution is shown to have a dramatic effect on disentanglement. In particular, the regularization loss is unaffected by latent rotation when training with rotationally-invariant priors, and thus learning non-rotationally-invariant priors aids greatly in capturing the properties of generative factors, improving disentanglement. Some issues inherent to training VAEs, such as the convergence to over-regularized local minima are illustrated and investigated, and potential techniques for mitigation are presented.

[279]  arXiv:2109.07466 (cross-list from math.OC) [pdf, ps, other]
Title: Neural network optimal feedback control with enhanced closed loop stability
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)

Recent research has shown that supervised learning can be an effective tool for designing optimal feedback controllers for high-dimensional nonlinear dynamic systems. But the behavior of these neural network (NN) controllers is still not well understood. In this paper we use numerical simulations to demonstrate that typical test accuracy metrics do not effectively capture the ability of an NN controller to stabilize a system. In particular, some NNs with high test accuracy can fail to stabilize the dynamics. To address this we propose two NN architectures which locally approximate a linear quadratic regulator (LQR). Numerical simulations confirm our intuition that the proposed architectures reliably produce stabilizing feedback controllers without sacrificing performance. In addition, we introduce a preliminary theoretical result describing some stability properties of such NN-controlled systems.

Replacements for Thu, 16 Sep 21

[280]  arXiv:1608.04834 (replaced) [pdf, ps, other]
Title: Asymptotic approximation of central binomial coefficients with rigorous error bounds
Authors: Richard P. Brent
Comments: 11 pages, 1 table; added more references in v3, minor corrections in v4
Subjects: Numerical Analysis (math.NA); Classical Analysis and ODEs (math.CA); Combinatorics (math.CO)
[281]  arXiv:1707.07221 (replaced) [pdf, ps, other]
Title: Packing Topological Minors Half-Integrally
Authors: Chun-Hung Liu
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
[282]  arXiv:1902.02060 (replaced) [pdf, other]
Title: On ADMM in Deep Learning: Convergence and Saturation-Avoidance
Comments: This is a revised version of our previous one entitled "A Convergence Analysis of Nonlinearly Constrained ADMM in Deep Learning, arXiv:1902.02060" with some significantly changes
Journal-ref: Journal of Machine Learning Research 22 (2021) 1-67
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[283]  arXiv:1903.01287 (replaced) [pdf, other]
Title: Safety Verification and Robustness Analysis of Neural Networks via Quadratic Constraints and Semidefinite Programming
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[284]  arXiv:1906.00389 (replaced) [pdf, other]
Title: Disparate Vulnerability to Membership Inference Attacks
Comments: To appear in Privacy-Enhancing Technologies Symposium (PETS) 2022
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (stat.ML)
[285]  arXiv:1906.03622 (replaced) [pdf, other]
Title: On a Combination of Alternating Minimization and Nesterov's Momentum
Comments: Compared to previous versions: dual WB problem and complexity analysis for WB problem corrected, updated and extended experiments
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[286]  arXiv:1910.08288 (replaced) [pdf, other]
Title: Hierarchical Attentive Knowledge Graph Embedding for Personalized Recommendation
Journal-ref: Electronic Commerce Research and Applications 48 (2021) 101071
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
[287]  arXiv:2001.04425 (replaced) [pdf, other]
Title: Negative Statements Considered Useful
Comments: 22 pages
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Databases (cs.DB)
[288]  arXiv:2002.00071 (replaced) [pdf, ps, other]
Title: Rigorous Guarantees for Tyler's M-estimator via quantum expansion
Comments: Fixed Lemma 5.18 bug
Subjects: Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST)
[289]  arXiv:2003.01900 (replaced) [pdf, ps, other]
Title: Minimum Enclosing Parallelogram with Outliers
Authors: Zhengyang Guo, Yi Li
Subjects: Computational Geometry (cs.CG)
[290]  arXiv:2004.08596 (replaced) [pdf, other]
Title: DAPnet: A Double Self-attention Convolutional Network for Point Cloud Semantic Labeling
Comments: 12 pages, 7 figures
Journal-ref: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[291]  arXiv:2004.12187 (replaced) [pdf, other]
Title: Cost Automata, Safe Schemes, and Downward Closures
Comments: journal submission
Subjects: Formal Languages and Automata Theory (cs.FL)
[292]  arXiv:2004.13018 (replaced) [pdf, ps, other]
Title: An Extension of Plücker Relations with Applications to Subdeterminant Maximization
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
[293]  arXiv:2006.07218 (replaced) [pdf, other]
Title: An Accurate, Scalable and Verifiable Protocol for Federated Differentially Private Averaging
Comments: 41 pages
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[294]  arXiv:2006.08251 (replaced) [pdf, other]
Title: Adversarial Weighting for Domain Adaptation in Regression
Comments: 8 pages, 6 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[295]  arXiv:2006.11580 (replaced) [pdf, ps, other]
Title: Finite-size scaling, phase coexistence, and algorithms for the random cluster model on random graphs
Comments: This update includes new results on the slow mixing of Markov chains (Theorem 6)
Subjects: Probability (math.PR); Data Structures and Algorithms (cs.DS); Mathematical Physics (math-ph)
[296]  arXiv:2006.15717 (replaced) [pdf]
Title: Calculating Great Britains half-hourly electrical demand from publicly available data
Comments: 33 pages, 3 Figures, 6 tables
Subjects: Computers and Society (cs.CY)
[297]  arXiv:2007.04001 (replaced) [pdf, other]
Title: Supervised machine learning techniques for data matching based on similarity metrics
Subjects: Machine Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)
[298]  arXiv:2007.14863 (replaced) [pdf, other]
Title: Automatic Detection of Aedes aegypti Breeding Grounds Based on Deep Networks with Spatio-Temporal Consistency
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[299]  arXiv:2008.02258 (replaced) [pdf, ps, other]
Title: Expected Size of Random Tukey Layers and Convex Layers
Subjects: Computational Geometry (cs.CG)
[300]  arXiv:2008.04998 (replaced) [pdf, other]
Title: Blockchain-Enabled Internet-of-Things Platform for End-to-End Industrial Hemp Supply Chain
Comments: 18 pages, 9 figures
Subjects: Cryptography and Security (cs.CR)
[301]  arXiv:2009.05236 (replaced) [pdf, other]
Title: An Efficient Quantitative Approach for Optimizing Convolutional Neural Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[302]  arXiv:2009.09734 (replaced) [pdf, other]
Title: Electing the Executive Branch
Subjects: Multiagent Systems (cs.MA)
[303]  arXiv:2009.12250 (replaced) [pdf, other]
Title: Trace-Checking CPS Properties: Bridging the Cyber-Physical Gap
Subjects: Software Engineering (cs.SE); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
[304]  arXiv:2009.12436 (replaced) [pdf, other]
Title: FLC tuned with Gravitational Search Algorithm for Nonlinear Pose Filter
Comments: 2020 IEEE International Conference on Systems, Man and Cybernetics (SMC). arXiv admin note: text overlap with arXiv:2008.07595
Subjects: Systems and Control (eess.SY)
[305]  arXiv:2009.12623 (replaced) [pdf, other]
Title: Lossy Checkpoint Compression in Full Waveform Inversion: a case study with ZFPv0.5.5 and the Overthrust Model
Subjects: Computational Physics (physics.comp-ph); Numerical Analysis (math.NA)
[306]  arXiv:2009.14742 (replaced) [pdf, ps, other]
Title: A Generalization of Bohr-Mollerup's Theorem for Higher Order Convex Functions
Comments: To appear in the book series: Developments in Mathematics, Springer, 2021-2022
Subjects: Classical Analysis and ODEs (math.CA); Discrete Mathematics (cs.DM); Combinatorics (math.CO); Number Theory (math.NT)
[307]  arXiv:2010.01678 (replaced) [pdf, other]
Title: Optimal Neural Program Synthesis from Multimodal Specifications
Comments: Findings of EMNLP 2021
Subjects: Computation and Language (cs.CL); Programming Languages (cs.PL)
[308]  arXiv:2010.10343 (replaced) [pdf, other]
Title: Provenance Graph Kernel
Comments: 14 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)
[309]  arXiv:2010.10726 (replaced) [pdf, other]
Title: MINVO Basis: Finding Simplexes with Minimum Volume Enclosing Polynomial Curves
Comments: 23 pages, 21 figures
Subjects: Computational Geometry (cs.CG); Graphics (cs.GR); Robotics (cs.RO)
[310]  arXiv:2010.14377 (replaced) [pdf, other]
Title: Designing optimal networks for multi-commodity transport problem
Comments: 13 pages, 7 figures
Subjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI); Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO)
[311]  arXiv:2011.03618 (replaced) [pdf, other]
Title: Learning Human Search Behavior from Egocentric Visual Inputs
Comments: The proceeding of EUROGRAPHICS 2021
Journal-ref: Computer Graphics Forum 2021
Subjects: Robotics (cs.RO); Graphics (cs.GR)
[312]  arXiv:2011.05953 (replaced) [pdf, ps, other]
Title: $(f,Γ)$-Divergences: Interpolating between $f$-Divergences and Integral Probability Metrics
Comments: 49 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[313]  arXiv:2011.09148 (replaced) [pdf, other]
Title: Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting and Regularization
Comments: New results: Extensions to model with label noise
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[314]  arXiv:2011.13127 (replaced) [pdf, other]
Title: Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode
Subjects: Programming Languages (cs.PL)
[315]  arXiv:2011.15084 (replaced) [pdf, other]
Title: Likelihood-Based Diverse Sampling for Trajectory Forecasting
Comments: ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[316]  arXiv:2012.01410 (replaced) [pdf, other]
Title: Ontological Smart Contracts in OASIS: Ontology for Agents, Systems, and Integration of Services (Extended Version)
Comments: This work has been accepted for publication at The 14th International Symposium on Intelligent Distributed Computing, 16--18 September 2021 - Online. Paper accepted on 8 September 2020
Journal-ref: The 14th International Symposium on Intelligent Distributed Computing, 16--18 September 2021, On-line
Subjects: Artificial Intelligence (cs.AI)
[317]  arXiv:2012.04882 (replaced) [pdf, other]
Title: Infusing Multi-Source Knowledge with Heterogeneous Graph Neural Network for Emotional Conversation Generation
Comments: Accepted at AAAI 2021, Code:this https URL
Subjects: Computation and Language (cs.CL)
[318]  arXiv:2012.05688 (replaced) [pdf, other]
Title: DA-HGT: Domain Adaptive Heterogeneous Graph Transformer
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[319]  arXiv:2012.09335 (replaced) [pdf, other]
Title: Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games
Subjects: Robotics (cs.RO); Computer Science and Game Theory (cs.GT)
[320]  arXiv:2012.09962 (replaced) [pdf, other]
Title: Making Contrastive Learning Robust to Shortcuts
Comments: The first two authors contributed equally to this paper
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[321]  arXiv:2012.13537 (replaced) [pdf, ps, other]
Title: LSTM-Aided Hybrid Random Access Scheme for 6G Machine Type Communication Networks
Subjects: Signal Processing (eess.SP); Networking and Internet Architecture (cs.NI)
[322]  arXiv:2012.15455 (replaced) [pdf, other]
Title: Fully Synthetic Data Improves Neural Machine Translation with Knowledge Distillation
Subjects: Computation and Language (cs.CL)
[323]  arXiv:2101.06963 (replaced) [pdf, other]
Title: Uncertainty-Aware Body Composition Analysis with Deep Regression Ensembles on UK Biobank MRI
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[324]  arXiv:2101.07034 (replaced) [pdf, other]
Title: AGRNet: Adaptive Graph Representation Learning and Reasoning for Face Parsing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[325]  arXiv:2101.07663 (replaced) [pdf, other]
Title: Salient Object Detection via Integrity Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[326]  arXiv:2101.08248 (replaced) [pdf, other]
Title: Data-to-text Generation by Splicing Together Nearest Neighbors
Comments: EMNLP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[327]  arXiv:2101.09592 (replaced) [pdf, ps, other]
Title: Point-hyperplane incidence geometry and the log-rank conjecture
Comments: 13 pages, no figures. Discussion is revised
Subjects: Combinatorics (math.CO); Computational Complexity (cs.CC)
[328]  arXiv:2101.09783 (replaced) [pdf, other]
Title: Termination Analysis Without the Tears
Subjects: Programming Languages (cs.PL)
[329]  arXiv:2102.00050 (replaced) [pdf, ps, other]
Title: Sequential prediction under log-loss and misspecification
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
[330]  arXiv:2102.02315 (replaced) [pdf]
Title: Real-Time Optimal Trajectory Planning for Autonomous Vehicles and Lap Time Simulation Using Machine Learning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[331]  arXiv:2102.03973 (replaced) [pdf, other]
Title: Solid Texture Synthesis using Generative Adversarial Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[332]  arXiv:2102.05347 (replaced) [pdf, ps, other]
Title: From Sampling to Optimization on Discrete Domains with Applications to Determinant Maximization
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[333]  arXiv:2102.08442 (replaced) [pdf, other]
Title: SCAPE: Learning Stiffness Control from Augmented Position Control Experiences
Comments: Accepted at CoRL 2021
Subjects: Robotics (cs.RO)
[334]  arXiv:2102.11121 (replaced) [pdf, other]
Title: Direct Estimation of Appearance Models for Segmentation
Comments: To appear in the SIAM Journal on Imaging Sciences (SIIMS)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[335]  arXiv:2102.11625 (replaced) [pdf, other]
Title: Assessing the Readability of Policy Documents on the Digital Single Market of the European Union
Authors: Jukka Ruohonen
Comments: Proceedings of the Eighth International Conference on eDemocracy & eGovernment (ICEDEG 2021), Quito (online), IEEE, pp. 205-209
Subjects: Computers and Society (cs.CY)
[336]  arXiv:2102.12459 (replaced) [pdf, other]
Title: When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Authors: Tao Lei
Journal-ref: EMNLP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[337]  arXiv:2103.02334 (replaced) [pdf, ps, other]
Title: Developing NOMA to Next Generation Multiple Access (NGMA): Future Vision and Research Opportunities
Comments: 7 pages, 5 figures, 1 table
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[338]  arXiv:2103.02429 (replaced) [pdf, other]
Title: Land Cover Mapping in Limited Labels Scenario: A Survey
Comments: 8 pages, 1 figure
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[339]  arXiv:2103.06359 (replaced) [pdf, other]
Title: Hiding Leader's Identity in Leader-Follower Navigation through Multi-Agent Reinforcement Learning
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
[340]  arXiv:2103.08764 (replaced) [pdf, other]
Title: Fast and Accurate: Video Enhancement using Sparse Depth
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[341]  arXiv:2103.11070 (replaced) [pdf, other]
Title: Attribute Alignment: Controlling Text Generation from Pre-trained Language Models
Journal-ref: EMNLP 2021 Findings
Subjects: Computation and Language (cs.CL)
[342]  arXiv:2103.11648 (replaced) [pdf, other]
Title: D3p -- A Python Package for Differentially-Private Probabilistic Programming
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[343]  arXiv:2103.14010 (replaced) [pdf, other]
Title: Self-Supervised Training Enhances Online Continual Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[344]  arXiv:2103.14146 (replaced) [pdf, other]
Title: Describing and Localizing Multiple Changes with Transformers
Comments: Accepted by ICCV2021. 18 pages, 15 figures, project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[345]  arXiv:2103.14373 (replaced) [pdf, other]
Title: D2C-SR: A Divergence to Convergence Approach for Real-World Image Super-Resolution
Comments: 14 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[346]  arXiv:2103.15009 (replaced) [pdf, ps, other]
Title: Unclonable Encryption, Revisited
Subjects: Cryptography and Security (cs.CR); Quantum Physics (quant-ph)
[347]  arXiv:2103.15429 (replaced) [pdf, other]
Title: Efficient Explanations from Empirical Explainers
Comments: Accepted to the EMNLP 2021 Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[348]  arXiv:2104.02412 (replaced) [pdf, other]
Title: Applying splitting methods with complex coefficients to the numerical integration of unitary problems
Comments: 18 pages, 7 figures. To be published in Journal of Computational Dynamics
Subjects: Numerical Analysis (math.NA); Quantum Physics (quant-ph)
[349]  arXiv:2104.04886 (replaced) [pdf, other]
Title: Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[350]  arXiv:2104.05158 (replaced) [pdf, other]
[351]  arXiv:2104.05932 (replaced) [pdf, other]
Title: VR3Dense: Voxel Representation Learning for 3D Object Detection and Monocular Dense Depth Reconstruction
Comments: Accepted at IJCAI 2021 Artificial Intelligence for Autonomous Driving Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[352]  arXiv:2104.06476 (replaced) [pdf, other]
Title: Incremental Multi-Target Domain Adaptation for Object Detection with Efficient Domain Transfer
Comments: Submitted for Pattern Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[353]  arXiv:2104.07275 (replaced) [pdf, other]
Title: Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing
Subjects: Computation and Language (cs.CL)
[354]  arXiv:2104.08211 (replaced) [pdf]
Title: Robust Open-Vocabulary Translation from Visual Text Representations
Comments: Accepted to EMNLP 2021
Subjects: Computation and Language (cs.CL)
[355]  arXiv:2104.08313 (replaced) [pdf, other]
Title: Does language help generalization in vision models?
Comments: Paper accepted at the CoNLL 2021 conference. This version: section added on the performance of the visual and visio-linguistic models on linquistic tasks
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[356]  arXiv:2104.08718 (replaced) [pdf, other]
Title: CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Journal-ref: EMNLP 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[357]  arXiv:2104.10157 (replaced) [pdf, other]
Title: VideoGPT: Video Generation using VQ-VAE and Transformers
Comments: Project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[358]  arXiv:2104.10749 (replaced) [pdf, other]
Title: Constantine: Automatic Side-Channel Resistance Using Efficient Control and Data Flow Linearization
Comments: Proceedings of the ACM Conference on Computer and Communications Security (CCS) 2021. Code and BibTeX entry available at this https URL
Subjects: Cryptography and Security (cs.CR); Programming Languages (cs.PL)
[359]  arXiv:2104.12167 (replaced) [src]
Title: A Novel Unified Stereo Stimuli based Binocular Eye-Tracking System for Accurate 3D Gaze Estimation
Comments: Add new experimental results
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[360]  arXiv:2104.12810 (replaced) [pdf, other]
Title: Classical and Quantum algorithms for generic Syndrome Decoding problems and applications to the Lee metric
Subjects: Cryptography and Security (cs.CR)
[361]  arXiv:2104.12999 (replaced) [pdf, ps, other]
Title: Separating Rank Logic from Polynomial Time
Authors: Moritz Lichter
Comments: 54 pages. Full version of a paper appeared at LICS 2021. [v3] Fixed some minor mistakes/typos in the proofs [v4] Fixed a mistake in the CFI construction
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)
[362]  arXiv:2104.13542 (replaced) [pdf, other]
Title: STORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation
Comments: Accepted for oral presentation at the Conference on Robot Learning (CoRL), 2021. Code available at: this https URL
Subjects: Robotics (cs.RO)
[363]  arXiv:2104.14282 (replaced) [pdf]
Title: VIRDOCD: a VIRtual DOCtor to Predict Dengue Fatality
Comments: 17 pages, 5 figures, 8 tables
Journal-ref: Expert Systems. 2021;e12796
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Biological Physics (physics.bio-ph)
[364]  arXiv:2105.00404 (replaced) [pdf, ps, other]
Title: A Joint Design for STAR-RIS enhanced NOMA-CoMP Networks: A Simultaneous-Signal-Enhancement-and-Cancellation-based (SSECB) Design
Subjects: Information Theory (cs.IT)
[365]  arXiv:2105.01595 (replaced) [pdf, other]
Title: Self-Improving Semantic Perception for Indoor Localisation
Comments: A summary video can be accessed at this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[366]  arXiv:2105.01650 (replaced) [pdf, other]
Title: Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[367]  arXiv:2105.02132 (replaced) [pdf, other]
Title: Self-Supervised Learning from Automatically Separated Sound Scenes
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[368]  arXiv:2105.03247 (replaced) [pdf, other]
Title: MOTR: End-to-End Multiple-Object Tracking with TRansformer
Comments: Revised version. Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[369]  arXiv:2105.03531 (replaced) [pdf, other]
Title: On the Complexity of Verification of Time-Sensitive Distributed Systems: Technical Report
Comments: This Technical Report updates and subsumes the technical report arXiv:1606.07886. arXiv admin note: text overlap with arXiv:1606.07886
Subjects: Computational Complexity (cs.CC); Logic in Computer Science (cs.LO)
[370]  arXiv:2105.06162 (replaced) [pdf, other]
Title: Variable Coded Batch Matrix Multiplication
Comments: 8 pages, 3 figures, to be published in IEEE Global Communications Conference (GLOBECOM) 2021
Subjects: Information Theory (cs.IT)
[371]  arXiv:2105.06232 (replaced) [pdf, other]
Title: Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters
Comments: The first two authors contribute equally
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[372]  arXiv:2105.06631 (replaced) [pdf, other]
Title: Ordering-Based Causal Discovery with Reinforcement Learning
Comments: Accepted to IJCAI'2021
Subjects: Machine Learning (cs.LG)
[373]  arXiv:2105.06965 (replaced) [pdf, other]
Title: Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction
Comments: Equal contribution by SR and GP. Accepted in CoNLL 2021
Subjects: Computation and Language (cs.CL)
[374]  arXiv:2105.07044 (replaced) [pdf, other]
Title: SA-GAN: Structure-Aware GAN for Organ-Preserving Synthetic CT Generation
Comments: Accepted to MICCAI 2021
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[375]  arXiv:2105.07111 (replaced) [pdf, other]
Title: Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[376]  arXiv:2105.07605 (replaced) [pdf, other]
Title: Utility Maximization for Multihop Wireless Networks Employing BATS Codes
Comments: This paper was presented in part at 2020 IEEE International Conference on Communications
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
[377]  arXiv:2105.07609 (replaced) [pdf, ps, other]
Title: Intrablock Interleaving for Batched Network Coding with Blockwise Adaptive Recoding
Comments: This paper was presented in part at 2021 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
[378]  arXiv:2105.07614 (replaced) [pdf, other]
Title: A Unified Adaptive Recoding Framework for Batched Network Coding
Comments: This paper was presented in part at 2019 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
[379]  arXiv:2105.09136 (replaced) [pdf, other]
Title: Periodic Freight Demand Estimation for Large-scale Tactical Planning
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[380]  arXiv:2105.09601 (replaced) [pdf, other]
Title: See, Hear, Read: Leveraging Multimodality with Guided Attention for Abstractive Text Summarization
Comments: Journal paper accepted in Knowledge Based Systems
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[381]  arXiv:2105.11103 (replaced) [pdf, other]
Title: Dissecting Click Fraud Autonomy in the Wild
Comments: Accepted to ACM CCS 2021
Subjects: Cryptography and Security (cs.CR)
[382]  arXiv:2105.11259 (replaced) [pdf, other]
Title: PTR: Prompt Tuning with Rules for Text Classification
Subjects: Computation and Language (cs.CL)
[383]  arXiv:2105.11609 (replaced) [pdf, other]
Title: Polarimetric Spatio-Temporal Light Transport Probing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[384]  arXiv:2105.13509 (replaced) [pdf, other]
Title: Learning to Stylize Novel Views
Comments: Project page: this https URL Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[385]  arXiv:2105.13753 (replaced) [pdf]
Title: New Encoder Learning for Captioning Heavy Rain Images via Semantic Visual Feature Matching
Journal-ref: Journal of Imaging Science and Technology, Sept. 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[386]  arXiv:2105.13790 (replaced) [pdf, other]
Title: Concentration Inequalities for Cross-validation in Scattered Data Approximation
Subjects: Numerical Analysis (math.NA)
[387]  arXiv:2105.14685 (replaced) [pdf, other]
Title: DeepChange: A Long-Term Person Re-Identification Benchmark
Authors: Peng Xu, Xiatian Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[388]  arXiv:2105.14707 (replaced) [pdf, ps, other]
Title: Emergence and algorithmic information dynamics of systems and observers
Subjects: Information Theory (cs.IT); Formal Languages and Automata Theory (cs.FL); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Dynamical Systems (math.DS)
[389]  arXiv:2106.00589 (replaced) [pdf, other]
Title: Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Reinforcement Learning
Subjects: Machine Learning (cs.LG)
[390]  arXiv:2106.01621 (replaced) [pdf, other]
Title: ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[391]  arXiv:2106.02588 (replaced) [pdf, other]
Title: Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis
Subjects: Machine Learning (cs.LG); Analysis of PDEs (math.AP); Machine Learning (stat.ML)
[392]  arXiv:2106.03821 (replaced) [pdf, other]
Title: Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion
Comments: In INTERSPEECH 2021
Journal-ref: Proc. Interspeech 2021, 2381-2385
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[393]  arXiv:2106.03919 (replaced) [pdf, other]
Title: Learning to Detect Multi-Modal Grasps for Dexterous Grasping in Dense Clutter
Comments: IROS 2021 Accepted Version
Subjects: Robotics (cs.RO)
[394]  arXiv:2106.04112 (replaced) [pdf, other]
Title: Harnessing Unrecognizable Faces for Improving Face Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[395]  arXiv:2106.04280 (replaced) [pdf, other]
Title: Optimizing a Binary Intelligent Reflecting Surface for OFDM Communications under Mutual Coupling
Authors: Emil Björnson
Comments: To appear at the 25th International ITG Workshop on Smart Antennas (WSA 2021), 6 pages, 6 figures. The code and dataset is available at this https URL
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[396]  arXiv:2106.04803 (replaced) [pdf, other]
Title: CoAtNet: Marrying Convolution and Attention for All Data Sizes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[397]  arXiv:2106.07487 (replaced) [pdf, other]
Title: pix2rule: End-to-end Neuro-symbolic Rule Learning
Comments: IJCLR-NeSy, 41 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[398]  arXiv:2106.10533 (replaced) [pdf, other]
Title: Learning to Reach, Swim, Walk and Fly in One Trial: Data-Driven Control with Scarce Data and Side Information
Comments: Initial submission
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Robotics (cs.RO); Optimization and Control (math.OC)
[399]  arXiv:2106.10852 (replaced) [pdf, other]
Title: CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head Redirection
Comments: 21 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[400]  arXiv:2106.12995 (replaced) [pdf, other]
Title: Userfault Objects: Transparent Programmable Memory
Comments: In Proceedings of ICOOLPS '21: Workshop on Implementation, Compilation, Optimization of OO Languages, Programs and Systems (ICOOOLPS '21). UFOs repository: this https URL
Subjects: Programming Languages (cs.PL); Software Engineering (cs.SE)
[401]  arXiv:2106.13092 (replaced) [pdf, other]
Title: BotRGCN: Twitter Bot Detection with Relational Graph Convolutional Networks
Comments: accepted at ASONAM 2021 as short paper; this is the full paper version at the time of submission. arXiv admin note: text overlap with arXiv:2106.13089
Subjects: Social and Information Networks (cs.SI)
[402]  arXiv:2106.13233 (replaced) [pdf, other]
Title: Post-Selections in AI and How to Avoid Them
Authors: Juyang Weng
Comments: 29 pages, 5 figures. An earlier vision of the first part has been accepted as an IJCNN 2021 paper and an earlier version of the second part has been accepted as an ICDL 2021 paper
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[403]  arXiv:2107.01518 (replaced) [pdf, other]
Title: Hierarchical Policies for Cluttered-Scene Grasping with Latent Plans
Subjects: Robotics (cs.RO)
[404]  arXiv:2107.02104 (replaced) [pdf, other]
Title: RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[405]  arXiv:2107.02823 (replaced) [pdf, other]
Title: Deep Learning based Micro-expression Recognition: A Survey
Comments: 23 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[406]  arXiv:2107.02993 (replaced) [pdf, other]
Title: Embedding digital chronotherapy into medical devices -- A canine validation for controlling status epilepticus through multi-scale rhythmic brain stimulation
Comments: 6 text pages, 4 main figures, Fig 1 has 4 panels, Fig 2 has 3 panels, Fig 4 has 2 panels. 2 text pages of supplementary material, 1 supplementary figure
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
[407]  arXiv:2107.05252 (replaced) [pdf, other]
Title: OmniLytics: A Blockchain-based Secure Data Market for Decentralized Machine Learning
Comments: An initial version of the article has been published in International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021(this http URL). This version has been submmited to AAAI'22
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[408]  arXiv:2107.06011 (replaced) [pdf, other]
Title: Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[409]  arXiv:2107.09698 (replaced) [pdf, other]
Title: Mono2Micro: A Practical and Effective Tool for Decomposing Monolithic Java Applications to Microservices
Subjects: Software Engineering (cs.SE)
[410]  arXiv:2107.10302 (replaced) [pdf, other]
Title: Adversarial for Good? How the Adversarial ML Community's Values Impede Socially Beneficial Uses of Attacks
Comments: Author list is ordered alphabetically as there is equal contribution. 4 pages Accepted by the ICML 2021 workshop on "A Blessing in Disguise:The Prospects and Perils of Adversarial Machine Learning"
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (cs.LG)
[411]  arXiv:2107.11921 (replaced) [pdf, other]
Title: Compensation Learning
Subjects: Machine Learning (cs.LG)
[412]  arXiv:2107.12499 (replaced) [pdf, other]
Title: CalCROP21: A Georeferenced multi-spectral dataset of Satellite Imagery and Crop Labels
Comments: 13 pages; 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[413]  arXiv:2107.12514 (replaced) [pdf, other]
Title: Language Grounding with 3D Objects
Comments: Conference on Robot Learning (CoRL) 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[414]  arXiv:2107.14581 (replaced) [pdf, ps, other]
Title: Causality in Higher Order Process Theories
Comments: In Proceedings QPL 2021, arXiv:2109.04886
Journal-ref: EPTCS 343, 2021, pp. 265-300
Subjects: Quantum Physics (quant-ph); Logic in Computer Science (cs.LO); Category Theory (math.CT)
[415]  arXiv:2108.00082 (replaced) [pdf, other]
Title: Towards Continual Entity Learning in Language Models for Conversational Agents
Comments: Submitted to NeurIPS 2021. Paper is under review
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[416]  arXiv:2108.00490 (replaced) [pdf, other]
Title: A survey of Monte Carlo methods for noisy and costly densities with application to reinforcement learning
Subjects: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
[417]  arXiv:2108.00966 (replaced) [pdf, other]
Title: Rationality and Reciprocity of Opinion Dynamics in Games
Subjects: Physics and Society (physics.soc-ph); Multiagent Systems (cs.MA); Social and Information Networks (cs.SI); Dynamical Systems (math.DS); Optimization and Control (math.OC)
[418]  arXiv:2108.02093 (replaced) [pdf, other]
Title: Free Lunch for Co-Saliency Detection: Context Adjustment
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[419]  arXiv:2108.02938 (replaced) [pdf, other]
Title: ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models
Comments: ICCV 2021 (oral)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[420]  arXiv:2108.04990 (replaced) [pdf, other]
Title: Perturbing Inputs for Fragile Interpretations in Deep Natural Language Processing
Comments: EMNLP-BlackboxNLP, 2021
Subjects: Computation and Language (cs.CL)
[421]  arXiv:2108.05274 (replaced) [pdf, other]
Title: Instance-weighted Central Similarity for Multi-label Image Retrieval
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[422]  arXiv:2108.05481 (replaced) [pdf, other]
Title: Ships, Splashes, and Waves on a Vast Ocean
Subjects: Graphics (cs.GR); Fluid Dynamics (physics.flu-dyn)
[423]  arXiv:2108.06259 (replaced) [pdf, other]
Title: VulnEx: Exploring Open-Source Software Vulnerabilities in Large Development Organizations to Understand Risk Exposure
Comments: 5 pages, 3 figures, LaTeX; corrected typos and wording
Journal-ref: 2021 IEEE Symposium on Visualization for Cyber Security (VizSec)
Subjects: Software Engineering (cs.SE)
[424]  arXiv:2108.06962 (replaced) [pdf, other]
Title: Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
Comments: Accepted at the 2021 International Conference on Computer Vision (ICCV)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[425]  arXiv:2108.08236 (replaced) [pdf, other]
Title: LOKI: Long Term and Key Intentions for Trajectory Prediction
Comments: ICCV 2021 (The dataset is available at this https URL)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Robotics (cs.RO)
[426]  arXiv:2108.08923 (replaced) [pdf, other]
Title: CenterPoly: real-time instance segmentation using bounding polygons
Comments: Accepted to the 2nd Autonomous Vehicle Vision Workshop (AVVision)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[427]  arXiv:2108.10539 (replaced) [pdf, other]
Title: Counterfactual Explainable Recommendation
Comments: To be published at the 30th ACM International Conference on Information and Knowledge Management (CIKM 2021)
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
[428]  arXiv:2108.10723 (replaced) [pdf, other]
Title: Improving 3D Object Detection with Channel-wise Transformer
Comments: Accepted by ICCV2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[429]  arXiv:2108.12043 (replaced) [pdf, other]
Title: A Tutorial on Learning Disentangled Representations in the Imaging Domain
Comments: We welcome any comments and suggestions for this draft. We will update the draft accordingly before submission. This draft follows a tutorial style but also surveys a considerable (200 citations) number of works
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[430]  arXiv:2108.12529 (replaced) [pdf, other]
Title: Designing for Multiple Centers of Power: A Taxonomy of Multi-level Governance in Online Social Platforms
Subjects: Human-Computer Interaction (cs.HC)
[431]  arXiv:2108.12848 (replaced) [pdf, other]
Title: Span Fine-tuning for Pre-trained Language Models
Comments: Accepted by EMNLP 2021 Finding(early version)
Subjects: Computation and Language (cs.CL)
[432]  arXiv:2108.13265 (replaced) [pdf, other]
Title: Predicting Road Flooding Risk with Machine Learning Approaches Using Crowdsourced Reports and Fine-grained Traffic Data
Comments: 17 pages, 7 figures
Subjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG)
[433]  arXiv:2109.00301 (replaced) [pdf, other]
Title: $\infty$-former: Infinite Memory Transformer
Subjects: Computation and Language (cs.CL)
[434]  arXiv:2109.00993 (replaced) [pdf, other]
Title: LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training
Subjects: Computation and Language (cs.CL)
[435]  arXiv:2109.02363 (replaced) [pdf, other]
Title: From Alignment to Assignment: Frustratingly Simple Unsupervised Entity Alignment
Comments: 11 pages; Accepted by EMNLP2021 (Main Conf)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[436]  arXiv:2109.02789 (replaced) [pdf, other]
Title: Mixed Attention Transformer for Leveraging Word-Level Knowledge to Neural Cross-Lingual Information Retrieval
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
[437]  arXiv:2109.02899 (replaced) [pdf, other]
Title: Blockchains through ontologies: the case study of the Ethereum ERC721 standard in OASIS (Extended Version)
Comments: Extended version of Blockchains through ontologies: the case study of the Ethereum ERC721 standard in OASIS, Proceedings of IDC 2021
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[438]  arXiv:2109.03413 (replaced) [pdf, other]
Title: YouRefIt: Embodied Reference Understanding with Language and Gesture
Comments: ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[439]  arXiv:2109.03667 (replaced) [pdf, ps, other]
Title: Energy Footprint of Blockchain Consensus Mechanisms Beyond Proof-of-Work
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[440]  arXiv:2109.03781 (replaced) [pdf, other]
Title: Highly Scalable and Provably Accurate Classification in Poincare Balls
Comments: A short version of this paper appears in ICDM 2021
Subjects: Machine Learning (cs.LG)
[441]  arXiv:2109.04453 (replaced) [pdf, other]
Title: Tube-Certified Trajectory Tracking for Nonlinear Systems With Robust Control Contraction Metrics
Comments: Shorter version submitted to IEEE Robotics and Automation Letters
Subjects: Systems and Control (eess.SY)
[442]  arXiv:2109.04566 (replaced) [pdf, other]
Title: SanitAIs: Unsupervised Data Augmentation to Sanitize Trojaned Neural Networks
Comments: 7 pages, 10 figures
Subjects: Machine Learning (cs.LG)
[443]  arXiv:2109.04966 (replaced) [pdf, other]
Title: Binarized P-Network: Deep Reinforcement Learning of Robot Control from Raw Images on FPGA
Comments: 8 pages, Accepted by Robotics and Automation Letters
Subjects: Robotics (cs.RO)
[444]  arXiv:2109.05186 (replaced) [pdf, other]
Title: Total Recall: a Customized Continual Learning Method for Neural Semantic Parsers
Comments: 9 pages, accepted to EMNLP2021
Subjects: Computation and Language (cs.CL)
[445]  arXiv:2109.05211 (replaced) [pdf, other]
Title: RobustART: Benchmarking Robustness on Architecture Design and Training Techniques
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[446]  arXiv:2109.05222 (replaced) [pdf, ps, other]
Title: Fundamental limits of over-the-air optimization: Are analog schemes optimal?
Comments: Few typos fixed and one reference added. An abridged version of this paper will appear in the proceedings of IEEE Global Communications Conference (GLOBECOM), Spain, 2021
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[447]  arXiv:2109.05602 (replaced) [pdf, other]
Title: Good-Enough Example Extrapolation
Authors: Jason Wei
Comments: Camera-ready for EMNLP 2021 main conference. V2 is corrected with SMOTE citation and model setup language is clarified
Subjects: Computation and Language (cs.CL)
[448]  arXiv:2109.05700 (replaced) [pdf, ps, other]
Title: Exploiting Heterogeneity in Robust Federated Best-Arm Identification
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
[449]  arXiv:2109.05804 (replaced) [pdf, other]
Title: MLFW: A Database for Face Recognition on Masked Faces
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[450]  arXiv:2109.05877 (replaced) [pdf, ps, other]
Title: Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
[451]  arXiv:2109.05958 (replaced) [pdf, other]
Title: Not All Models Localize Linguistic Knowledge in the Same Place: A Layer-wise Probing on BERToids' Representations
Comments: Accepted to BlackboxNLP Workshop at EMNLP 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[452]  arXiv:2109.06160 (replaced) [pdf, other]
Title: Augmenting Decision Making via Interactive What-If Analysis
Comments: This version has been removed by arXiv administrators due to privacy policy
Subjects: Databases (cs.DB); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[453]  arXiv:2109.06232 (replaced) [pdf, other]
Title: The Emergence of the Shape Bias Results from Communicative Efficiency
Comments: Accepted at CoNLL 2021
Subjects: Computation and Language (cs.CL); Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE)
[454]  arXiv:2109.06264 (replaced) [pdf, other]
Title: Post-OCR Document Correction with large Ensembles of Character Sequence Models
Subjects: Computation and Language (cs.CL)
[455]  arXiv:2109.06432 (replaced) [pdf, other]
Title: Improved Few-shot Segmentation by Redefinition of the Roles of Multi-level CNN Features
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[456]  arXiv:2109.06442 (replaced) [pdf, ps, other]
Title: Domain Sparsification of Discrete Distributions using Entropic Independence
Subjects: Data Structures and Algorithms (cs.DS); Probability (math.PR)
[457]  arXiv:2109.06505 (replaced) [pdf, other]
Title: Optimal To-Do List Gamification for Long Term Planning
Subjects: Artificial Intelligence (cs.AI)
[458]  arXiv:2109.06590 (replaced) [pdf, other]
Title: High-Fidelity GAN Inversion for Image Attribute Editing
Comments: Project Page is at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[459]  arXiv:2109.06595 (replaced) [pdf, other]
Title: GPT-2C: A GPT-2 parser for Cowrie honeypot logs
Subjects: Cryptography and Security (cs.CR)
[460]  arXiv:2109.06619 (replaced) [pdf, other]
Title: Sampling Network Guided Cross-Entropy Method for Unsupervised Point Cloud Registration
Comments: Accepted by ICCV-2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[461]  arXiv:2109.06630 (replaced) [pdf, other]
Title: Detecting Layout Templates in Complex Multiregion Files
Subjects: Information Retrieval (cs.IR)
[462]  arXiv:2109.06638 (replaced) [pdf, other]
Title: Learnable Discrete Wavelet Pooling (LDW-Pooling) For Convolutional Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[463]  arXiv:2109.06661 (replaced) [pdf, ps, other]
Title: Expert Knowledge-Guided Length-Variant Hierarchical Label Generation for Proposal Classification
Comments: 10 pages, Accepted as regular paper by ICDM 2021
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[464]  arXiv:2109.06662 (replaced) [pdf, other]
Title: Identifying partial mouse brain microscopy images from Allen reference atlas using a contrastively learned semantic space
Comments: Source code available at this https URL 7 pages, 5 figures. Version 2: Fix to 2nd author name and additional acknowledgments
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[465]  arXiv:2109.06668 (replaced) [pdf, other]
Title: Exploration in Deep Reinforcement Learning: A Comprehensive Survey
Comments: Repolishment is made, revise some incorrect descriptions
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[466]  arXiv:2109.06719 (replaced) [pdf, other]
Title: Sparse Fuzzy Attention for Structured Sentiment Analysis
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[467]  arXiv:2109.06732 (replaced) [pdf, other]
Title: Tuna-AI: tuna biomass estimation with Machine Learning models trained on oceanography and echosounder FAD data
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[468]  arXiv:2109.06768 (replaced) [pdf, other]
Title: MotionHint: Self-Supervised Monocular Visual Odometry with Motion Constraints
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[469]  arXiv:2109.06838 (replaced) [pdf, other]
Title: ePiC: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding
Comments: Work in progress
Subjects: Computation and Language (cs.CL)
[470]  arXiv:2109.06862 (replaced) [pdf, other]
Title: Legal Transformer Models May Not Always Help
Subjects: Computation and Language (cs.CL)
[ total of 470 entries: 1-470 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2109, contact, help  (Access key information)