We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 804 entries: 1-804 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Fri, 22 Oct 21

[1]  arXiv:2110.10150 [pdf, other]
Title: Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents
Subjects: Computation and Language (cs.CL)

Text summarization is an essential task to help readers capture salient information from documents, news, interviews, and meetings. However, most state-of-the-art pretrained language models are unable to efficiently process long text commonly seen in the summarization problem domain. In this paper, we propose Summ^N, a simple, flexible, and effective multi-stage framework for input texts that are longer than the maximum context lengths of typical pretrained LMs. Summ^N first generates the coarse summary in multiple stages and then produces the final fine-grained summary based on them. The framework can process input text of arbitrary length by adjusting the number of stages while keeping the LM context size fixed. Moreover, it can deal with both documents and dialogues and can be used on top of any underlying backbone abstractive summarization model. Our experiments demonstrate that Summ^N significantly outperforms previous state-of-the-art methods by improving ROUGE scores on three long meeting summarization datasets AMI, ICSI, and QMSum, two long TV series datasets from SummScreen, and a newly proposed long document summarization dataset GovReport. Our data and code are available at https://github.com/chatc/Summ-N.

[2]  arXiv:2110.10151 [pdf, other]
Title: Can Fortran's 'do concurrent' replace directives for accelerated computing?
Comments: 18 pages, 2 figures, Accepted for publication at WACCPD 2021
Subjects: Mathematical Software (cs.MS); Programming Languages (cs.PL)

Recently, there has been growing interest in using standard language constructs (e.g. C++'s Parallel Algorithms and Fortran's do concurrent) for accelerated computing as an alternative to directive-based APIs (e.g. OpenMP and OpenACC). These constructs have the potential to be more portable, and some compilers already (or have plans to) support such standards. Here, we look at the current capabilities, portability, and performance of replacing directives with Fortran's do concurrent using a mini-app that currently implements OpenACC for GPU-acceleration and OpenMP for multi-core CPU parallelism. We replace as many directives as possible with do concurrent, testing various configurations and compiler options within three major compilers: GNU's gfortran, NVIDIA's nvfortran, and Intel's ifort. We find that with the right compiler versions and flags, many directives can be replaced without loss of performance or portability, and, in the case of nvfortran, they can all be replaced. We discuss limitations that may apply to more complicated codes and future language additions that may mitigate them. The software and Singularity containers are publicly provided to allow the results to be reproduced.

[3]  arXiv:2110.10152 [pdf, other]
Title: Identifying Stroke Indicators Using Rough Sets
Comments: Accepted in IEEE Access, 2020
Subjects: Machine Learning (cs.LG)

Stroke is widely considered as the second most common cause of mortality. The adverse consequences of stroke have led to global interest and work for improving the management and diagnosis of stroke. Various techniques for data mining have been used globally for accurate prediction of occurrence of stroke based on the risk factors that are associated with the electronic health care records (EHRs) of the patients. In particular, EHRs routinely contain several thousands of features and most of them are redundant and irrelevant that need to be discarded to enhance the prediction accuracy. The choice of feature-selection methods can help in improving the prediction accuracy of the model and efficient data management of the archived input features. In this paper, we systematically analyze the various features in EHR records for the detection of stroke. We propose a novel rough-set based technique for ranking the importance of the various EHR records in detecting stroke. Unlike the conventional rough-set techniques, our proposed technique can be applied on any dataset that comprises binary feature sets. We evaluated our proposed method in a publicly available dataset of EHR, and concluded that age, average glucose level, heart disease, and hypertension were the most essential attributes for detecting stroke in patients. Furthermore, we benchmarked the proposed technique with other popular feature-selection techniques. We obtained the best performance in ranking the importance of individual features in detecting stroke.

[4]  arXiv:2110.10153 [pdf, ps, other]
Title: Towards Puffin: The Creation of an Uncertainty Compiler
Comments: 21 Pages, 10 Figures
Subjects: Mathematical Software (cs.MS); Computation (stat.CO)

An uncertainty compiler is a tool that automatically translates original computer source code lacking explicit uncertainty analysis into code containing appropriate uncertainty representations and uncertainty propagation algorithms. We have developed an prototype uncertainty compiler along with an associated object-oriented uncertainty language in the form of a stand-alone Python library. It handles the specifications of input uncertainties and inserts calls to intrusive uncertainty quantification algorithms in the library. The uncertainty compiler can apply intrusive uncertainty propagation methods to codes or parts of codes and therefore more comprehensively and flexibly address both epistemic and aleatory uncertainties.

[5]  arXiv:2110.10165 [pdf, other]
Title: NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters
Comments: 16 pages, 6 figures. Accepted at ACML2021 (long oral). API is available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The benchmark datasets for neural architecture search (NAS) have been developed to alleviate the computationally expensive evaluation process and ensure a fair comparison. Recent NAS benchmarks only focus on architecture optimization, although the training hyperparameters affect the obtained model performances. Building the benchmark dataset for joint optimization of architecture and training hyperparameters is essential to further NAS research. The existing NAS-HPO-Bench is a benchmark for joint optimization, but it does not consider the network connectivity design as done in modern NAS algorithms. This paper introduces the first benchmark dataset for joint optimization of network connections and training hyperparameters, which we call NAS-HPO-Bench-II. We collect the performance data of 4K cell-based convolutional neural network architectures trained on the CIFAR-10 dataset with different learning rate and batch size settings, resulting in the data of 192K configurations. The dataset includes the exact data for 12 epoch training. We further build the surrogate model predicting the accuracies after 200 epoch training to provide the performance data of longer training epoch. By analyzing NAS-HPO-Bench-II, we confirm the dependency between architecture and training hyperparameters and the necessity of joint optimization. Finally, we demonstrate the benchmarking of the baseline optimization algorithms using NAS-HPO-Bench-II.

[6]  arXiv:2110.10174 [pdf, other]
Title: Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction
Comments: BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Every hand-object interaction begins with contact. Despite predicting the contact state between hands and objects is useful in understanding hand-object interactions, prior methods on hand-object analysis have assumed that the interacting hands and objects are known, and were not studied in detail. In this study, we introduce a video-based method for predicting contact between a hand and an object. Specifically, given a video and a pair of hand and object tracks, we predict a binary contact state (contact or no-contact) for each frame. However, annotating a large number of hand-object tracks and contact labels is costly. To overcome the difficulty, we propose a semi-supervised framework consisting of (i) automatic collection of training data with motion-based pseudo-labels and (ii) guided progressive label correction (gPLC), which corrects noisy pseudo-labels with a small amount of trusted data. We validated our framework's effectiveness on a newly built benchmark dataset for hand-object contact prediction and showed superior performance against existing baseline methods. Code and data are available at https://github.com/takumayagi/hand_object_contact_prediction.

[7]  arXiv:2110.10183 [pdf, other]
Title: Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation
Comments: 16 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

It is hard to generate an image at target view well for previous cross-view image translation methods that directly adopt a simple encoder-decoder or U-Net structure, especially for drastically different views and severe deformation cases. To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage. In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code via our novel CrossMLP blocks. Then the coarse results are generated progressively under the guidance of those cues. Moreover, in the second stage, we design a refined pixel-level loss that eases the noisy semantic label problem with more reasonable regularization in a more compact fashion for better optimization. Extensive experimental results on Dayton~\cite{vo2016localizing} and CVUSA~\cite{workman2015wide} datasets show that our method can generate significantly better results than state-of-the-art methods. The source code and trained models are available at https://github.com/Amazingren/CrossMLP.

[8]  arXiv:2110.10185 [pdf, other]
Title: GenNI: Human-AI Collaboration for Data-Backed Text Generation
Comments: IEEE VIS 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Table2Text systems generate textual output based on structured data utilizing machine learning. These systems are essential for fluent natural language interfaces in tools such as virtual assistants; however, left to generate freely these ML systems often produce misleading or unexpected outputs. GenNI (Generation Negotiation Interface) is an interactive visual system for high-level human-AI collaboration in producing descriptive text. The tool utilizes a deep learning model designed with explicit control states. These controls allow users to globally constrain model generations, without sacrificing the representation power of the deep learning models. The visual interface makes it possible for users to interact with AI systems following a Refine-Forecast paradigm to ensure that the generation system acts in a manner human users find suitable. We report multiple use cases on two experiments that improve over uncontrolled generation approaches, while at the same time providing fine-grained control. A demo and source code are available at https://genni.vizhub.ai .

[9]  arXiv:2110.10187 [pdf, other]
Title: Sky Is Not the Limit: Tighter Rank Bounds for Elevator Automata in Büchi Automata Complementation (Technical Report)
Subjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL)

We propose several heuristics for mitigating one of the main causes of combinatorial explosion in rank-based complementation of B\"{u}chi automata (BAs): unnecessarily high bounds on the ranks of states. First, we identify elevator automata, which is a large class of BAs (generalizing semi-deterministic BAs), occurring often in practice, where ranks of states are bounded according to the structure of strongly connected components. The bounds for elevator automata also carry over to general BAs that contain elevator automata as a sub-structure. Second, we introduce two techniques for refining bounds on the ranks of BA states using data-flow analysis of the automaton. We implement out techniques as an extension of the tool Ranker for BA complementation and show that they indeed greatly prune the generated state space, obtaining significantly better results and outperforming other state-of-the-art tools on a large set of benchmarks.

[10]  arXiv:2110.10189 [pdf, other]
Title: StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Geometric organization of objects into semantically meaningful arrangements pervades the built world. As such, assistive robots operating in warehouses, offices, and homes would greatly benefit from the ability to recognize and rearrange objects into these semantically meaningful structures. To be useful, these robots must contend with previously unseen objects and receive instructions without significant programming. While previous works have examined recognizing pairwise semantic relations and sequential manipulation to change these simple relations none have shown the ability to arrange objects into complex structures such as circles or table settings. To address this problem we propose a novel transformer-based neural network, StructFormer, which takes as input a partial-view point cloud of the current object arrangement and a structured language command encoding the desired object configuration. We show through rigorous experiments that StructFormer enables a physical robot to rearrange novel objects into semantically meaningful structures with multi-object relational constraints inferred from the language command.

[11]  arXiv:2110.10194 [pdf, other]
Title: CoFi: Coarse-to-Fine ICP for LiDAR Localization in an Efficient Long-lasting Point Cloud Map
Comments: 8 pages, submitted to ICRA 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Signal Processing (eess.SP)

LiDAR odometry and localization has attracted increasing research interest in recent years. In the existing works, iterative closest point (ICP) is widely used since it is precise and efficient. Due to its non-convexity and its local iterative strategy, however, ICP-based method easily falls into local optima, which in turn calls for a precise initialization. In this paper, we propose CoFi, a Coarse-to-Fine ICP algorithm for LiDAR localization. Specifically, the proposed algorithm down-samples the input point sets under multiple voxel resolution, and gradually refines the transformation from the coarse point sets to the fine-grained point sets. In addition, we propose a map based LiDAR localization algorithm that extracts semantic feature points from the LiDAR frames and apply CoFi to estimate the pose on an efficient point cloud map. With the help of the Cylinder3D algorithm for LiDAR scan semantic segmentation, the proposed CoFi localization algorithm demonstrates the state-of-the-art performance on the KITTI odometry benchmark, with significant improvement over the literature.

[12]  arXiv:2110.10200 [pdf, other]
Title: fairadapt: Causal Reasoning for Fair Data Pre-processing
Comments: Keywords: algorithmic fairness, causal inference, machine learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)

Machine learning algorithms are useful for various predictions tasks, but they can also learn how to discriminate, based on gender, race or other sensitive attributes. This realization gave rise to the field of fair machine learning, which aims to measure and mitigate such algorithmic bias. This manuscript describes the R-package fairadapt, which implements a causal inference pre-processing method. By making use of a causal graphical model and the observed data, the method can be used to address hypothetical questions of the form "What would my salary have been, had I been of a different gender/race?". Such individual level counterfactual reasoning can help eliminate discrimination and help justify fair decisions. We also discuss appropriate relaxations which assume certain causal pathways from the sensitive attribute to the outcome are not discriminatory.

[13]  arXiv:2110.10205 [pdf, other]
Title: MultiHead MultiModal Deep Interest Recommendation Network
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

With the development of information technology, human beings are constantly producing a large amount of information at all times. How to obtain the information that users are interested in from the large amount of information has become an issue of great concern to users and even business managers. In order to solve this problem, from traditional machine learning to deep learning recommendation systems, researchers continue to improve optimization models and explore solutions. Because researchers have optimized more on the recommendation model network structure, they have less research on enriching recommendation model features, and there is still room for in-depth recommendation model optimization. Based on the DIN\cite{Authors01} model, this paper adds multi-head and multi-modal modules, which enriches the feature sets that the model can use, and at the same time strengthens the cross-combination and fitting capabilities of the model. Experiments show that the multi-head multi-modal DIN improves the recommendation prediction effect, and outperforms current state-of-the-art methods on various comprehensive indicators.

[14]  arXiv:2110.10206 [pdf, other]
Title: Come Again? Re-Query in Referring Expression Comprehension
Comments: 17 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

To build a shared perception of the world, humans rely on the ability to resolve misunderstandings by requesting and accepting clarifications. However, when evaluating visiolinguistic models, metrics such as accuracy enforce the assumption that a decision must be made based on a single piece of evidence. In this work, we relax this assumption for the task of referring expression comprehension by allowing the model to request help when its confidence is low. We consider two ways in which this help can be provided: multimodal re-query, where the user is allowed to point or click to provide additional information to the model, and rephrase re-query, where the user is only allowed to provide another referring expression. We demonstrate the importance of re-query by showing that providing the best referring expression for all objects can increase accuracy by up to 21.9% and that this accuracy can be matched by re-querying only 12% of initial referring expressions. We further evaluate re-query functions for both multimodal and rephrase re-query across three modern approaches and demonstrate combined replacement for rephrase re-query, which improves average single-query performance by up to 6.5% and converges to as close as 1.6% of the upper bound of single-query performance.

[15]  arXiv:2110.10211 [pdf, other]
Title: Learning Equivariances and Partial Equivariances from Data
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Group equivariant Convolutional Neural Networks (G-CNNs) constrain features to respect the chosen symmetries, and lead to better generalization when these symmetries appear in the data. However, if the chosen symmetries are not present, group equivariant architectures lead to overly constrained models and worse performance. Frequently, the distribution of the data can be better represented by a subset of a group than by the group as a whole, e.g., rotations in $[-90^{\circ}, 90^{\circ}]$. In such cases, a model that respects equivariance partially is better suited to represent the data. Moreover, relevant symmetries may differ for low and high-level features, e.g., edge orientations in a face, and face poses relative to the camera. As a result, the optimal level of equivariance may differ per layer. In this work, we introduce Partial G-CNNs: a family of equivariant networks able to learn partial and full equivariances from data at every layer end-to-end. Partial G-CNNs retain full equivariance whenever beneficial, e.g., for rotated MNIST, but are able to restrict it whenever it becomes harmful, e.g., for 6~/~9 or natural image classification. Partial G-CNNs perform on par with G-CNNs when full equivariance is necessary, and outperform them otherwise. Our method is applicable to discrete groups, continuous groups and combinations thereof.

[16]  arXiv:2110.10213 [pdf, other]
Title: Neural Medication Extraction: A Comparison of Recent Models in Supervised and Semi-supervised Learning Settings
Comments: IEEE International Conference on Healthcare Informatics (ICHI 2021)
Subjects: Computation and Language (cs.CL)

Drug prescriptions are essential information that must be encoded in electronic medical records. However, much of this information is hidden within free-text reports. This is why the medication extraction task has emerged. To date, most of the research effort has focused on small amount of data and has only recently considered deep learning methods. In this paper, we present an independent and comprehensive evaluation of state-of-the-art neural architectures on the I2B2 medical prescription extraction task both in the supervised and semi-supervised settings. The study shows the very competitive performance of simple DNN models on the task as well as the high interest of pre-trained models. Adapting the latter models on the I2B2 dataset enables to push medication extraction performances above the state-of-the-art. Finally, the study also confirms that semi-supervised techniques are promising to leverage large amounts of unlabeled data in particular in low resource setting when labeled data is too costly to acquire.

[17]  arXiv:2110.10217 [pdf, other]
Title: An Adaptive Sampling and Edge Detection Approach for Encoding Static Images for Spiking Neural Networks
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

Current state-of-the-art methods of image classification using convolutional neural networks are often constrained by both latency and power consumption. This places a limit on the devices, particularly low-power edge devices, that can employ these methods. Spiking neural networks (SNNs) are considered to be the third generation of artificial neural networks which aim to address these latency and power constraints by taking inspiration from biological neuronal communication processes. Before data such as images can be input into an SNN, however, they must be first encoded into spike trains. Herein, we propose a method for encoding static images into temporal spike trains using edge detection and an adaptive signal sampling method for use in SNNs. The edge detection process consists of first performing Canny edge detection on the 2D static images and then converting the edge detected images into two X and Y signals using an image-to-signal conversion method. The adaptive signaling approach consists of sampling the signals such that the signals maintain enough detail and are sensitive to abrupt changes in the signal. Temporal encoding mechanisms such as threshold-based representation (TBR) and step-forward (SF) are then able to be used to convert the sampled signals into spike trains. We use various error and indicator metrics to optimize and evaluate the efficiency and precision of the proposed image encoding approach. Comparison results between the original and reconstructed signals from spike trains generated using edge-detection and adaptive temporal encoding mechanism exhibit 18x and 7x reduction in average root mean square error (RMSE) compared to the conventional SF and TBR encoding, respectively, while used for encoding MNIST dataset.

[18]  arXiv:2110.10221 [pdf, other]
Title: The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding
Comments: 23 pages, 25 figures and 10 tables
Subjects: Machine Learning (cs.LG)

There is often variation in the shape and size of input data used for deep learning. In many cases, such data can be represented using tensors with non-uniform shapes, or ragged tensors. Due to limited and non-portable support for efficient execution on ragged tensors, current deep learning frameworks generally use techniques such as padding and masking to make the data shapes uniform and then offload the computations to optimized kernels for dense tensor algebra. Such techniques can, however, lead to a lot of wasted computation and therefore, a loss in performance. This paper presents CoRa, a tensor compiler that allows users to easily generate efficient code for ragged tensor operators targeting a wide range of CPUs and GPUs. Evaluating CoRa on a variety of operators on ragged tensors as well as on an encoder layer of the transformer model, we find that CoRa (i)performs competitively with hand-optimized implementations of the operators and the transformer encoder and (ii) achieves, over PyTorch, a 1.6X geomean speedup for the encoder on an Nvidia GPU and a 1.86X geomean speedup for the multi-head attention module used in transformers on an ARM CPU.

[19]  arXiv:2110.10223 [pdf, other]
Title: A Federated Learning Aggregation Algorithm for Pervasive Computing: Evaluation and Comparison
Comments: 9th IEEE International Conference on Pervasive Computing and Communications (PerCom 2021)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Pervasive computing promotes the installation of connected devices in our living spaces in order to provide services. Two major developments have gained significant momentum recently: an advanced use of edge resources and the integration of machine learning techniques for engineering applications. This evolution raises major challenges, in particular related to the appropriate distribution of computing elements along an edge-to-cloud continuum. About this, Federated Learning has been recently proposed for distributed model training in the edge. The principle of this approach is to aggregate models learned on distributed clients in order to obtain a new, more general model. The resulting model is then redistributed to clients for further training. To date, the most popular federated learning algorithm uses coordinate-wise averaging of the model parameters for aggregation. However, it has been shown that this method is not adapted in heterogeneous environments where data is not identically and independently distributed (non-iid). This corresponds directly to some pervasive computing scenarios where heterogeneity of devices and users challenges machine learning with the double objective of generalization and personalization. In this paper, we propose a novel aggregation algorithm, termed FedDist, which is able to modify its model architecture (here, deep neural network) by identifying dissimilarities between specific neurons amongst the clients. This permits to account for clients' specificity without impairing generalization. Furthermore, we define a complete method to evaluate federated learning in a realistic way taking generalization and personalization into account.
Using this method, FedDist is extensively tested and compared with three state-of-the-art federated learning algorithms on the pervasive domain of Human Activity Recognition with smartphones.

[20]  arXiv:2110.10225 [pdf, other]
Title: What Averages Do Not Tell -- Predicting Real Life Processes with Sequential Deep Learning
Subjects: Machine Learning (cs.LG)

Deep Learning is proven to be an effective tool for modeling sequential data as shown by the success in Natural Language, Computer Vision and Signal Processing. Process Mining concerns discovering insights on business processes from their execution data that are logged by supporting information systems. The logged data (event log) is formed of event sequences (traces) that correspond to executions of a process. Many Deep Learning techniques have been successfully adapted for predictive Process Mining that aims to predict process outcomes, remaining time, the next event, or even the suffix of running traces. Traces in Process Mining are multimodal sequences and very differently structured than natural language sentences or images. This may require a different approach to processing. So far, there has been little focus on these differences and the challenges introduced. Looking at suffix prediction as the most challenging of these tasks, the performance of Deep Learning models was evaluated only on average measures and for a small number of real-life event logs. Comparing the results between papers is difficult due to different pre-processing and evaluation strategies. Challenges that may be relevant are the skewness of trace-length distribution and the skewness of the activity distribution in real-life event logs. We provide an end-to-end framework which enables to compare the performance of seven state-of-the-art sequential architectures in common settings. Results show that sequence modeling still has a lot of room for improvement for majority of the more complex datasets. Further research and insights are required to get consistent performance not just in average measures but additionally over all the prefixes.

[21]  arXiv:2110.10232 [pdf, other]
Title: Test time Adaptation through Perturbation Robustness
Comments: Under review
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Data samples generated by several real world processes are dynamic in nature \textit{i.e.}, their characteristics vary with time. Thus it is not possible to train and tackle all possible distributional shifts between training and inference, using the host of transfer learning methods in literature. In this paper, we tackle this problem of adapting to domain shift at inference time \textit{i.e.}, we do not change the training process, but quickly adapt the model at test-time to handle any domain shift. For this, we propose to enforce consistency of predictions of data sampled in the vicinity of test sample on the image manifold. On a host of test scenarios like dealing with corruptions (CIFAR-10-C and CIFAR-100-C), and domain adaptation (VisDA-C), our method is at par or significantly outperforms previous methods.

[22]  arXiv:2110.10233 [pdf, ps, other]
Title: Forecasting Market Prices using DL with Data Augmentation and Meta-learning: ARIMA still wins!
Comments: Accepted at the ICBINB Workshop @ NeurIPS, 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Deep-learning techniques have been successfully used for time-series forecasting and have often shown superior performance on many standard benchmark datasets as compared to traditional techniques. Here we present a comprehensive and comparative study of performance of deep-learning techniques for forecasting prices in financial markets. We benchmark state-of-the-art deep-learning baselines, such as NBeats, etc., on data from currency as well as stock markets. We also generate synthetic data using a fuzzy-logic based model of demand driven by technical rules such as moving averages, which are often used by traders. We benchmark the baseline techniques on this synthetic data as well as use it for data augmentation. We also apply gradient-based meta-learning to account for non-stationarity of financial time-series. Our extensive experiments notwithstanding, the surprising result is that the standard ARIMA models outperforms deep-learning even using data augmentation or meta-learning. We conclude by speculating as to why this might be the case.

[23]  arXiv:2110.10234 [pdf, other]
Title: More Engineering, No Silos: Rethinking Processes and Interfaces in Collaboration between Interdisciplinary Teams for Machine Learning Projects
Comments: 22 pages, 10 figures, 5 tables
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)

The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process and collect recommendations to address these challenges.

[24]  arXiv:2110.10236 [pdf, other]
Title: Optimal Sequential Stochastic Deployment of Multiple Passenger Robots
Authors: Chris (Yu Hsuan)Lee, Graeme Best, Geoffrey A. Hollinger
Journal-ref: Proc. IEEE International Conference on Robotics and Automation (ICRA) 2021
Subjects: Robotics (cs.RO)

We present a new algorithm for deploying passenger robots in marsupial robot systems. A marsupial robot system consists of a carrier robot (e.g., a ground vehicle), which is highly capable and has a long mission duration, and at least one passenger robot (e.g., a short-duration aerial vehicle) transported by the carrier. We optimize the performance of passenger robot deployment by proposing an algorithm that reasons over uncertainty by exploiting information about the prior probability distribution of features of interest in the environment. Our algorithm is formulated as a solution to a sequential stochastic assignment problem (SSAP). The key feature of the algorithm is a recurrence relationship that defines a set of observation thresholds that are used to decide when to deploy passenger robots. Our algorithm computes the optimal policy in $O(NR)$ time, where $N$ is the number of deployment decision points and $R$ is the number of passenger robots to be deployed. We conducted drone deployment exploration experiments on real-world data from the DARPA Subterranean challenge to test the SSAP algorithm. Our results show that our deployment algorithm outperforms other competing algorithms, such as the classic secretary approach and baseline partitioning methods, and is comparable to an offline oracle algorithm.

[25]  arXiv:2110.10237 [pdf, other]
Title: Stochastic Assignment for Deploying Multiple Marsupial Robots
Authors: Chris (Yu Hsuan)Lee, Graeme Best, Geoffrey A. Hollinger
Journal-ref: Proc. IEEE International Symposium on Multi-Robot and Multi-Agent Systems (MRS) 2021
Subjects: Robotics (cs.RO)

Marsupial robot teams consist of carrier robots that transport and deploy multiple passenger robots, such as a team of ground robots that carry and deploy multiple aerial robots, to rapidly explore complex environments. We specifically address the problem of planning the deployment times and locations of the carrier robots to best meet the objectives of a mission while reasoning over uncertain future observations and rewards. While prior work proposed optimal, polynomial-time solutions to single-carrier robot systems, the multiple-carrier robot deployment problem is fundamentally harder as it requires addressing conflicts and dependencies between deployments of multiple passenger robots. We propose a centralized heuristic search algorithm for the multiple-carrier robot deployment problem that combines Monte Carlo Tree Search with a dynamic programming-based solution to the Sequential Stochastic Assignment Problem as a rollout action-selection policy. Our results with both procedurally-generated data and data drawn from the DARPA Subterranean Challenge Urban Circuit show the viability of our approach and substantial exploration performance improvements over alternative algorithms.

[26]  arXiv:2110.10239 [pdf, other]
Title: 1st Place Solution for the UVO Challenge on Image-based Open-World Segmentation 2021
Comments: Code:this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We describe our two-stage instance segmentation framework we use to compete in the challenge. The first stage of our framework consists of an object detector, which generates object proposals in the format of bounding boxes. Then, the images and the detected bounding boxes are fed to the second stage, where a segmentation network is applied to segment the objects in the bounding boxes. We train all our networks in a class-agnostic way. Our approach achieves the first place in the UVO 2021 Image-based Open-World Segmentation Challenge.

[27]  arXiv:2110.10246 [pdf]
Title: 2020 State of the Octoverse: Securing the World's Software
Comments: published by GitHub
Subjects: Software Engineering (cs.SE)

Open source is the connective tissue for much of the information economy. You would be hard-pressed to find a scenario where your data does not pass through at least one open source component. Many of the services and technology we all rely on, from banking to healthcare, also rely on open source software. The artifacts of open source code serve as critical i infrastructure for much of the global economy, making the security of open source software mission-critical to the world.

[28]  arXiv:2110.10248 [pdf]
Title: 2020 State of the Octoverse: Finding Balance Between Work and Play
Comments: GitHub
Subjects: Software Engineering (cs.SE)

Over the past year, many developers and other technology professionals have transitioned to a remote-first world, as COVID-19 pressed organizations to support working from home whenever possible. This shift quickly changed the routines and environments where we work and learn, redrawing the lines between personal and professional lives. How does this affect the ways we develop and deliver software, both at work and in our open source projects?

[29]  arXiv:2110.10249 [pdf, other]
Title: Neural Stochastic Partial Differential Equations
Subjects: Machine Learning (cs.LG)

Stochastic partial differential equations (SPDEs) are the mathematical tool of choice to model complex spatio-temporal dynamics of systems subject to the influence of randomness. We introduce the Neural SPDE model providing an extension to two important classes of physics-inspired neural architectures. On the one hand, it extends all the popular neural -- ordinary, controlled, stochastic, rough -- differential equation models in that it is capable of processing incoming information even when the latter evolves in an infinite dimensional state space. On the other hand, it extends Neural Operators -- recent generalizations of neural networks modelling mappings between functional spaces -- in that it can be used to learn complex SPDE solution operators $(u_0,\xi) \mapsto u$ depending simultaneously on an initial condition $u_0$ and on a stochastic forcing term $\xi$, while remaining resolution-invariant and equation-agnostic. A Neural SPDE is constrained to respect real physical dynamics and consequently requires only a modest amount of data to train, depends on a significantly smaller amount of parameters and has better generalization properties compared to Neural Operators. Through various experiments on semilinear SPDEs with additive and multiplicative noise (including the stochastic Navier-Stokes equations) we demonstrate how Neural SPDEs can flexibly be used in a supervised learning setting as well as conditional generative models to sample solutions of SPDEs conditioned on prior knowledge, systematically achieving in both cases better performance than all alternative models.

[30]  arXiv:2110.10255 [pdf, other]
Title: A Simple Approach to Continual Learning by Transferring Skill Parameters
Comments: Submitted to ICRA 2022
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

In order to be effective general purpose machines in real world environments, robots not only will need to adapt their existing manipulation skills to new circumstances, they will need to acquire entirely new skills on-the-fly. A great promise of continual learning is to endow robots with this ability, by using their accumulated knowledge and experience from prior skills. We take a fresh look at this problem, by considering a setting in which the robot is limited to storing that knowledge and experience only in the form of learned skill policies. We show that storing skill policies, careful pre-training, and appropriately choosing when to transfer those skill policies is sufficient to build a continual learner in the context of robotic manipulation. We analyze which conditions are needed to transfer skills in the challenging Meta-World simulation benchmark. Using this analysis, we introduce a pair-wise metric relating skills that allows us to predict the effectiveness of skill transfer between tasks, and use it to reduce the problem of continual learning to curriculum selection. Given an appropriate curriculum, we show how to continually acquire robotic manipulation skills without forgetting, and using far fewer samples than needed to train them from scratch.

[31]  arXiv:2110.10261 [pdf, other]
Title: Learning Domain Specific Language Models for Automatic Speech Recognition through Machine Translation
Authors: Saurav Jha
Comments: Master's thesis work from July 2021, 22 pages including references
Subjects: Computation and Language (cs.CL)

Automatic Speech Recognition (ASR) systems have been gaining popularity in the recent years for their widespread usage in smart phones and speakers. Building ASR systems for task-specific scenarios is subject to the availability of utterances that adhere to the style of the task as well as the language in question. In our work, we target such a scenario wherein task-specific text data is available in a language that is different from the target language in which an ASR Language Model (LM) is expected. We use Neural Machine Translation (NMT) as an intermediate step to first obtain translations of the task-specific text data. We then train LMs on the 1-best and N-best translations and study ways to improve on such a baseline LM. We develop a procedure to derive word confusion networks from NMT beam search graphs and evaluate LMs trained on these confusion networks. With experiments on the WMT20 chat translation task dataset, we demonstrate that NMT confusion networks can help to reduce the perplexity of both n-gram and recurrent neural network LMs compared to those trained only on N-best translations.

[32]  arXiv:2110.10275 [pdf]
Title: Early- and in-season crop type mapping without current-year ground truth: generating labels from historical information via a topology-based approach
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Land cover classification in remote sensing is often faced with the challenge of limited ground truth. Incorporating historical information has the potential to significantly lower the expensive cost associated with collecting ground truth and, more importantly, enable early- and in-season mapping that is helpful to many pre-harvest decisions. In this study, we propose a new approach that can effectively transfer knowledge about the topology (i.e. relative position) of different crop types in the spectral feature space (e.g. the histogram of SWIR1 vs RDEG1 bands) to generate labels, thereby support crop classification in a different year. Importantly, our approach does not attempt to transfer classification decision boundaries that are susceptible to inter-annual variations of weather and management, but relies on the more robust and shift-invariant topology information. We tested this approach for mapping corn/soybeans in the US Midwest and paddy rice/corn/soybeans in Northeast China using Landsat-8 and Sentinel-2 data. Results show that our approach automatically generates high-quality labels for crops in the target year immediately after each image becomes available. Based on these generated labels from our approach, the subsequent crop type mapping using a random forest classifier reach the F1 score as high as 0.887 for corn as early as the silking stage and 0.851 for soybean as early as the flowering stage and the overall accuracy of 0.873 in Iowa. In Northeast China, F1 scores of paddy rice, corn and soybeans and the overall accuracy can exceed 0.85 two and half months ahead of harvest. Overall, these results highlight unique advantages of our approach in transferring historical knowledge and maximizing the timeliness of crop maps. Our approach supports a general paradigm shift towards learning transferrable and generalizable knowledge to facilitate land cover classification.

[33]  arXiv:2110.10278 [pdf, other]
Title: Fine-Grained Control of Artistic Styles in Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advances in generative models and adversarial training have enabled artificially generating artworks in various artistic styles. It is highly desirable to gain more control over the generated style in practice. However, artistic styles are unlike object categories -- there are a continuous spectrum of styles distinguished by subtle differences. Few works have been explored to capture the continuous spectrum of styles and apply it to a style generation task. In this paper, we propose to achieve this by embedding original artwork examples into a continuous style space. The style vectors are fed to the generator and discriminator to achieve fine-grained control. Our method can be used with common generative adversarial networks (such as StyleGAN). Experiments show that our method not only precisely controls the fine-grained artistic style but also improves image quality over vanilla StyleGAN as measured by FID.

[34]  arXiv:2110.10283 [pdf, ps, other]
Title: Fine-Grained Complexity Theory: Conditional Lower Bounds for Computational Geometry
Authors: Karl Bringmann
Comments: Written version of a tutorial talk given at a special session of CiE'21
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)

Fine-grained complexity theory is the area of theoretical computer science that proves conditional lower bounds based on the Strong Exponential Time Hypothesis and similar conjectures. This area has been thriving in the last decade, leading to conditionally best-possible algorithms for a wide variety of problems on graphs, strings, numbers etc. This article is an introduction to fine-grained lower bounds in computational geometry, with a focus on lower bounds for polynomial-time problems based on the Orthogonal Vectors Hypothesis. Specifically, we discuss conditional lower bounds for nearest neighbor search under the Euclidean distance and Fr\'echet distance.

[35]  arXiv:2110.10284 [pdf, ps, other]
Title: flip-hoisting: Exploiting Repeated Parameters in Discrete Probabilistic Programs
Subjects: Artificial Intelligence (cs.AI)

Probabilistic programming is emerging as a popular and effective means of probabilistic modeling and an alternative to probabilistic graphical models. Probabilistic programs provide greater expressivity and flexibility in modeling probabilistic systems than graphical models, but this flexibility comes at a cost: there remains a significant disparity in performance between specialized Bayesian network solvers and probabilistic program inference algorithms. In this work we present a program analysis and associated optimization, flip-hoisting, that collapses repetitious parameters in discrete probabilistic programs to improve inference performance. flip-hoisting generalizes parameter sharing - a well-known important optimization from discrete graphical models - to probabilistic programs. We implement flip-hoisting in an existing probabilistic programming language and show empirically that it significantly improves inference performance, narrowing the gap between the performances of probabilistic programs and probabilistic graphical models.

[36]  arXiv:2110.10285 [pdf, ps, other]
Title: CUI @ Auto-UI: Exploring the Fortunate and Unfortunate Futures of Conversational Automotive User Interfaces
Comments: Workshop published and presented at Automotive User Interfaces 2021 (AutoUI 21)
Subjects: Human-Computer Interaction (cs.HC)

This work aims to connect the Automotive User Interfaces (Auto-UI) and Conversational User Interfaces (CUI) communities through discussion of their shared view of the future of automotive conversational user interfaces. The workshop aims to encourage creative consideration of optimistic and pessimistic futures, encouraging attendees to explore the opportunities and barriers that lie ahead through a game. Considerations of the future will be mapped out in greater detail through the drafting of research agendas, by which attendees will get to know each other's expertise and networks of resources. The two day workshop, consisting of two 90-minute sessions, will facilitate greater communication and collaboration between these communities, connecting researchers to work together to influence the futures they imagine in the workshop.

[37]  arXiv:2110.10286 [pdf, other]
Title: Robust Semi-Supervised Classification using GANs with Self-Organizing Maps
Comments: 9 pages, 13 figures This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Machine Learning (cs.LG)

Generative adversarial networks (GANs) have shown tremendous promise in learning to generate data and effective at aiding semi-supervised classification. However, to this point, semi-supervised GAN methods make the assumption that the unlabeled data set contains only samples of the joint distribution of the classes of interest, referred to as inliers. Consequently, when presented with a sample from other distributions, referred to as outliers, GANs perform poorly at determining that it is not qualified to make a decision on the sample. The problem of discriminating outliers from inliers while maintaining classification accuracy is referred to here as the DOIC problem. In this work, we describe an architecture that combines self-organizing maps (SOMs) with SS-GANS with the goal of mitigating the DOIC problem and experimental results indicating that the architecture achieves the goal. Multiple experiments were conducted on hyperspectral image data sets. The SS-GANS performed slightly better than supervised GANS on classification problems with and without the SOM. Incorporating the SOMs into the SS-GANs and the supervised GANS led to substantially mitigation of the DOIC problem when compared to SS-GANS and GANs without the SOMs. Furthermore, the SS-GANS performed much better than GANS on the DOIC problem, even without the SOMs.

[38]  arXiv:2110.10287 [pdf, other]
Title: Multi-concept adversarial attacks
Comments: 20 pages, 28 figures, 9 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Optimization and Control (math.OC); Machine Learning (stat.ML)

As machine learning (ML) techniques are being increasingly used in many applications, their vulnerability to adversarial attacks becomes well-known. Test time attacks, usually launched by adding adversarial noise to test instances, have been shown effective against the deployed ML models. In practice, one test input may be leveraged by different ML models. Test time attacks targeting a single ML model often neglect their impact on other ML models. In this work, we empirically demonstrate that naively attacking the classifier learning one concept may negatively impact classifiers trained to learn other concepts. For example, for the online image classification scenario, when the Gender classifier is under attack, the (wearing) Glasses classifier is simultaneously attacked with the accuracy dropped from 98.69 to 88.42. This raises an interesting question: is it possible to attack one set of classifiers without impacting the other set that uses the same test instance? Answers to the above research question have interesting implications for protecting privacy against ML model misuse. Attacking ML models that pose unnecessary risks of privacy invasion can be an important tool for protecting individuals from harmful privacy exploitation. In this paper, we address the above research question by developing novel attack techniques that can simultaneously attack one set of ML models while preserving the accuracy of the other. In the case of linear classifiers, we provide a theoretical framework for finding an optimal solution to generate such adversarial examples. Using this theoretical framework, we develop a multi-concept attack strategy in the context of deep learning. Our results demonstrate that our techniques can successfully attack the target classes while protecting the protected classes in many different settings, which is not possible with the existing test-time attack-single strategies.

[39]  arXiv:2110.10289 [pdf, other]
Title: On Coordinate Decoding for Keypoint Estimation Tasks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

A series of 2D (and 3D) keypoint estimation tasks are built upon heatmap coordinate representation, i.e. a probability map that allows for learnable and spatially aware encoding and decoding of keypoint coordinates on grids, even allowing for sub-pixel coordinate accuracy. In this report, we aim to reproduce the findings of DARK that investigated the 2D heatmap representation by highlighting the importance of the encoding of the ground truth heatmap and the decoding of the predicted heatmap to keypoint coordinates. The authors claim that a) a more principled distribution-aware coordinate decoding method overcomes the limitations of the standard techniques widely used in the literature, and b), that the reconstruction of heatmaps from ground-truth coordinates by generating accurate and continuous heatmap distributions lead to unbiased model training, contrary to the standard coordinate encoding process that quantizes the keypoint coordinates on the resolution of the input image grid.

[40]  arXiv:2110.10291 [pdf, other]
Title: A Deeper Look into RowHammer`s Sensitivities: Experimental Analysis of Real DRAM Chips and Implications on Future Attacks and Defenses
Comments: A shorter version of this work is to appear at the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-54), 2021
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)

RowHammer is a circuit-level DRAM vulnerability where repeatedly accessing (i.e., hammering) a DRAM row can cause bit flips in physically nearby rows. The RowHammer vulnerability worsens as DRAM cell size and cell-to-cell spacing shrink. Recent studies demonstrate that modern DRAM chips, including chips previously marketed as RowHammer-safe, are even more vulnerable to RowHammer than older chips such that the required hammer count to cause a bit flip has reduced by more than 10X in the last decade. Therefore, it is essential to develop a better understanding and in-depth insights into the RowHammer vulnerability of modern DRAM chips to more effectively secure current and future systems.
Our goal in this paper is to provide insights into fundamental properties of the RowHammer vulnerability that are not yet rigorously studied by prior works, but can potentially be $i$) exploited to develop more effective RowHammer attacks or $ii$) leveraged to design more effective and efficient defense mechanisms. To this end, we present an experimental characterization using 248~DDR4 and 24~DDR3 modern DRAM chips from four major DRAM manufacturers demonstrating how the RowHammer effects vary with three fundamental properties: 1)~DRAM chip temperature, 2)~aggressor row active time, and 3)~victim DRAM cell's physical location. Among our 16 new observations, we highlight that a RowHammer bit flip 1)~is very likely to occur in a bounded range, specific to each DRAM cell (e.g., 5.4% of the vulnerable DRAM cells exhibit errors in the range 70C to 90C), 2)~is more likely to occur if the aggressor row is active for longer time (e.g., RowHammer vulnerability increases by 36% if we keep a DRAM row active for 15 column accesses), and 3)~is more likely to occur in certain physical regions of the DRAM module under attack (e.g., 5% of the rows are 2x more vulnerable than the remaining 95% of the rows).

[41]  arXiv:2110.10293 [pdf, other]
Title: Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains. Meanwhile, model ensembling is one of the most universally applicable techniques in supervised learning literature and practice, offering a simple solution to reliably improve performance. But how to optimally combine self-supervised models to maximize representation quality has largely remained unaddressed. In this work, we provide a framework to perform self-supervised model ensembling via a novel method of learning representations directly through gradient descent at inference time. This technique improves representation quality, as measured by k-nearest neighbors, both on the in-domain dataset and in the transfer setting, with models transferable from the former setting to the latter. Additionally, this direct learning of feature through backpropagation improves representations from even a single model, echoing the improvements found in self-distillation.

[42]  arXiv:2110.10295 [pdf, other]
Title: Expressivity of Neural Networks via Chaotic Itineraries beyond Sharkovsky's Theorem
Comments: 47 pages, 19 figures
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Chaotic Dynamics (nlin.CD)

Given a target function $f$, how large must a neural network be in order to approximate $f$? Recent works examine this basic question on neural network \textit{expressivity} from the lens of dynamical systems and provide novel ``depth-vs-width'' tradeoffs for a large family of functions $f$. They suggest that such tradeoffs are governed by the existence of \textit{periodic} points or \emph{cycles} in $f$. Our work, by further deploying dynamical systems concepts, illuminates a more subtle connection between periodicity and expressivity: we prove that periodic points alone lead to suboptimal depth-width tradeoffs and we improve upon them by demonstrating that certain ``chaotic itineraries'' give stronger exponential tradeoffs, even in regimes where previous analyses only imply polynomial gaps. Contrary to prior works, our bounds are nearly-optimal, tighten as the period increases, and handle strong notions of inapproximability (e.g., constant $L_1$ error). More broadly, we identify a phase transition to the \textit{chaotic regime} that exactly coincides with an abrupt shift in other notions of function complexity, including VC-dimension and topological entropy.

[43]  arXiv:2110.10298 [pdf, other]
Title: Incorporating Rich Social Interactions Into MDPs
Comments: Submitted to the 39th IEEE Conference on Robotics and Automation (ICRA 2022). Do not distribute
Subjects: Robotics (cs.RO)

Much of what we do as humans is engage socially with other agents, a skill that robots must also eventually possess. We demonstrate that a rich theory of social interactions originating from microsociology and economics can be formalized by extending a nested MDP where agents reason about arbitrary functions of each other's hidden rewards. This extended Social MDP allows us to encode the five basic interactions that underlie microsociology: cooperation, conflict, coercion, competition, and exchange. The result is a robotic agent capable of executing social interactions zero-shot in new environments; like humans it can engage socially in novel ways even without a single example of that social interaction. Moreover, the judgments of these Social MDPs align closely with those of humans when considering which social interaction is taking place in an environment. This method both sheds light on the nature of social interactions, by providing concrete mathematical definitions, and brings rich social interactions into a mathematical framework that has proven to be natural for robotics, MDPs.

[44]  arXiv:2110.10302 [pdf, other]
Title: Layer-wise Adaptive Model Aggregation for Scalable Federated Learning
Subjects: Machine Learning (cs.LG)

In Federated Learning, a common approach for aggregating local models across clients is periodic averaging of the full model parameters. It is, however, known that different layers of neural networks can have a different degree of model discrepancy across the clients. The conventional full aggregation scheme does not consider such a difference and synchronizes the whole model parameters at once, resulting in inefficient network bandwidth consumption. Aggregating the parameters that are similar across the clients does not make meaningful training progress while increasing the communication cost. We propose FedLAMA, a layer-wise model aggregation scheme for scalable Federated Learning. FedLAMA adaptively adjusts the aggregation interval in a layer-wise manner, jointly considering the model discrepancy and the communication cost. The layer-wise aggregation method enables to finely control the aggregation interval to relax the aggregation frequency without a significant impact on the model accuracy. Our empirical study shows that FedLAMA reduces the communication cost by up to 60% for IID data and 70% for non-IID data while achieving a comparable accuracy to FedAvg.

[45]  arXiv:2110.10303 [pdf, other]
Title: Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution. This latent space distribution matching is a core component of WAE, and a challenging task. In this paper, we propose to use the contrastive learning framework that has been shown to be effective for self-supervised representation learning, as a means to resolve this problem. We do so by exploiting the fact that contrastive learning objectives optimize the latent space distribution to be uniform over the unit hyper-sphere, which can be easily sampled from. We show that using the contrastive learning framework to optimize the WAE loss achieves faster convergence and more stable optimization compared with existing popular algorithms for WAE. This is also reflected in the FID scores on CelebA and CIFAR-10 datasets, and the realistic generated image quality on the CelebA-HQ dataset.

[46]  arXiv:2110.10305 [pdf, other]
Title: When in Doubt, Summon the Titans: Efficient Inference with Large Models
Subjects: Machine Learning (cs.LG)

Scaling neural networks to "large" sizes, with billions of parameters, has been shown to yield impressive results on many challenging problems. However, the inference cost incurred by such large models often prevents their application in most real-world settings. In this paper, we propose a two-stage framework based on distillation that realizes the modelling benefits of the large models, while largely preserving the computational benefits of inference with more lightweight models. In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher. Such an approach allows us to efficiently employ large models in practical scenarios where easy examples are much more frequent than rare hard examples. Our proposed use of distillation to only handle easy instances allows for a more aggressive trade-off in the student size, thereby reducing the amortized cost of inference and achieving better accuracy than standard distillation. Empirically, we demonstrate the benefits of our approach on both image classification and natural language processing benchmarks.

[47]  arXiv:2110.10307 [pdf, other]
Title: Distributed Secret Sharing over a Public Channel from Correlated Random Variables
Authors: Remi A. Chou
Comments: 38 pages, 1 figure
Subjects: Information Theory (cs.IT)

We consider a secret-sharing model where a dealer distributes the shares of a secret among a set of participants with the constraint that only predetermined subsets of participants must be able to reconstruct the secret by pooling their shares. Our study generalizes Shamir's secret-sharing model in three directions. First, we allow a joint design of the protocols for the creation of the shares and the distribution of the shares, instead of constraining the model to independent designs. Second, instead of assuming that the participants and the dealer have access to information-theoretically secure channels at no cost, we assume that they have access to a public channel and correlated randomness. Third, motivated by a wireless network setting where the correlated randomness is obtained from channel gain measurements, we explore a setting where the dealer is an entity made of multiple sub-dealers. Our main results are inner and outer regions for the achievable secret rates that the dealer and the participants can obtain in this model. To this end, we develop two new achievability techniques, a first one to successively handle reliability and security constraints in a distributed setting, and a second one to reduce a multi-dealer setting to multiple single-user dealer settings. Our results yield the capacity region for threshold access structures when the correlated randomness corresponds to pairwise secret keys shared between each sub-dealer and each participant, and the capacity for the all-or-nothing access structure in the presence of a single dealer and arbitrarily correlated randomness.

[48]  arXiv:2110.10309 [pdf, other]
Title: Constrained Mean Shift for Representation Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We are interested in representation learning from labeled or unlabeled data. Inspired by recent success of self-supervised learning (SSL), we develop a non-contrastive representation learning method that can exploit additional knowledge. This additional knowledge may come from annotated labels in the supervised setting or an SSL model from another modality in the SSL setting. Our main idea is to generalize the mean-shift algorithm by constraining the search space of nearest neighbors, resulting in semantically purer representations. Our method simply pulls the embedding of an instance closer to its nearest neighbors in a search space that is constrained using the additional knowledge. By leveraging this non-contrastive loss, we show that the supervised ImageNet-1k pretraining with our method results in better transfer performance as compared to the baselines. Further, we demonstrate that our method is relatively robust to label noise. Finally, we show that it is possible to use the noisy constraint across modalities to train self-supervised video models.

[49]  arXiv:2110.10311 [pdf, other]
Title: EMF-Aware Cellular Networks in RIS-Assisted Environments
Subjects: Systems and Control (eess.SY)

The deployment of the 5th-generation cellular networks (5G) and beyond has triggered health concerns due to the electric and magnetic fields (EMF) exposure. In this paper, we propose a novel architecture to minimize the population exposure to EMF by considering a smart radio environment with a reconfigurable intelligent surface (RIS). Then, we optimize the RIS phases to minimize the exposure in terms of the exposure index (EI) while maintaining a minimum target quality of service. The proposed scheme achieves up to 20% reduction in EI compared to schemes without RISs.

[50]  arXiv:2110.10316 [pdf, other]
Title: Beamforming Design for Intelligent Reflecting Surface-Enhanced Symbiotic Radio Systems
Comments: This paper is submitted to ICC 2022
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper investigates multiuser multi-input single-output downlink symbiotic radio communication systems assisted by an intelligent reflecting surface (IRS). Different from existing methods ideally assuming the secondary user (SU) can jointly decode information symbols from both the access point (AP) and the IRS via multiuser detection, we consider a more practical SU that only non-coherent detection is available. To characterize the non-coherent decoding performance, a practical upper bound of the average symbol error rate (SER) is derived. Subsequently, we jointly optimize the beamformer at the AP and the phase shifts at the IRS to maximize the average sum-rate of the primary system taking into account the maximum tolerable SER constraint for the SU. To circumvent the couplings of variables, we exploit the Schur complement that facilitates the design of a suboptimal beamforming algorithm based on successive convex approximation. Our simulation results show that compared with various benchmark algorithms, the proposed scheme significantly improves the average sum-rate of the primary system, while guaranteeing the decoding performance of the secondary system.

[51]  arXiv:2110.10318 [pdf, other]
Title: Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction
Comments: Camera ready version. Accepted to WNUT 2021. Code for reproducing the experiments can be found at: this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We evaluate a simple approach to improving zero-shot multilingual transfer of mBERT on social media corpus by adding a pretraining task called translation pair prediction (TPP), which predicts whether a pair of cross-lingual texts are a valid translation. Our approach assumes access to translations (exact or approximate) between source-target language pairs, where we fine-tune a model on source language task data and evaluate the model in the target language. In particular, we focus on language pairs where transfer learning is difficult for mBERT: those where source and target languages are different in script, vocabulary, and linguistic typology. We show improvements from TPP pretraining over mBERT alone in zero-shot transfer from English to Hindi, Arabic, and Japanese on two social media tasks: NER (a 37% average relative improvement in F1 across target languages) and sentiment classification (12% relative improvement in F1) on social media text, while also benchmarking on a non-social media task of Universal Dependency POS tagging (6.7% relative improvement in accuracy). Our results are promising given the lack of social media bitext corpus. Our code can be found at: https://github.com/twitter-research/multilingual-alignment-tpp.

[52]  arXiv:2110.10319 [pdf, other]
Title: LMSOC: An Approach for Socially Sensitive Pretraining
Comments: Camera ready version. Accepted to EMNLP 2021 Findings. Code for reproducing the experiments can be found at: this https URL
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Information Retrieval (cs.IR); Machine Learning (cs.LG)

While large-scale pretrained language models have been shown to learn effective linguistic representations for many NLP tasks, there remain many real-world contextual aspects of language that current approaches do not capture. For instance, consider a cloze-test "I enjoyed the ____ game this weekend": the correct answer depends heavily on where the speaker is from, when the utterance occurred, and the speaker's broader social milieu and preferences. Although language depends heavily on the geographical, temporal, and other social contexts of the speaker, these elements have not been incorporated into modern transformer-based language models. We propose a simple but effective approach to incorporate speaker social context into the learned representations of large-scale language models. Our method first learns dense representations of social contexts using graph representation learning algorithms and then primes language model pretraining with these social context representations. We evaluate our approach on geographically-sensitive language-modeling tasks and show a substantial improvement (more than 100% relative lift on MRR) compared to baselines.

[53]  arXiv:2110.10320 [pdf, ps, other]
Title: Frontiers in Evolutionary Computation: A Workshop Report
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

In July of 2021, the Santa Fe Institute hosted a workshop on evolutionary computation as part of its Foundations of Intelligence in Natural and Artificial Systems project. This project seeks to advance the field of artificial intelligence by promoting interdisciplinary research on the nature of intelligence. The workshop brought together computer scientists and biologists to share their insights about the nature of evolution and the future of evolutionary computation. In this report, we summarize each of the talks and the subsequent discussions. We also draw out a number of key themes and identify important frontiers for future research.

[54]  arXiv:2110.10324 [pdf, other]
Title: Semantic Sensing and Planning for Human-Robot Collaboration in Uncertain Environments
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Autonomous robots can benefit greatly from human-provided semantic characterizations of uncertain task environments and states. However, the development of integrated strategies which let robots model, communicate, and act on such soft data remains challenging. Here, a framework is presented for active semantic sensing and planning in human-robot teams which addresses these gaps by formally combining the benefits of online sampling-based POMDP policies, multi-modal semantic interaction, and Bayesian data fusion. This approach lets humans opportunistically impose model structure and extend the range of semantic soft data in uncertain environments by sketching and labeling arbitrary landmarks across the environment. Dynamic updating of the environment while searching for a mobile target allows robotic agents to actively query humans for novel and relevant semantic data, thereby improving beliefs of unknown environments and target states for improved online planning. Target search simulations show significant improvements in time and belief state estimates required for interception versus conventional planning based solely on robotic sensing. Human subject studies demonstrate a average doubling in dynamic target capture rate compared to the lone robot case, employing reasoning over a range of user characteristics and interaction modalities. Video of interaction can be found at https://youtu.be/Eh-82ZJ1o4I.

[55]  arXiv:2110.10325 [pdf]
Title: One-Step Abductive Multi-Target Learning with Diverse Noisy Samples
Authors: Yongquan Yang
Comments: 6 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

One-step abductive multi-target learning (OSAMTL) was proposed to handle complex noisy labels. In this paper, giving definition of diverse noisy samples (DNS), we propose one-step abductive multi-target learning with DNS (OSAMTL-DNS) to expand the original OSAMTL to a wider range of tasks that handle complex noisy labels.

[56]  arXiv:2110.10328 [pdf, other]
Title: R$^3$Net:Relation-embedded Representation Reconstruction Network for Change Captioning
Comments: Accepted by EMNLP 2021
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Change captioning is to use a natural language sentence to describe the fine-grained disagreement between two similar images. Viewpoint change is the most typical distractor in this task, because it changes the scale and location of the objects and overwhelms the representation of real change. In this paper, we propose a Relation-embedded Representation Reconstruction Network (R$^3$Net) to explicitly distinguish the real change from the large amount of clutter and irrelevant changes. Specifically, a relation-embedded module is first devised to explore potential changed objects in the large amount of clutter. Then, based on the semantic similarities of corresponding locations in the two images, a representation reconstruction module (RRM) is designed to learn the reconstruction representation and further model the difference representation. Besides, we introduce a syntactic skeleton predictor (SSP) to enhance the semantic interaction between change localization and caption generation. Extensive experiments show that the proposed method achieves the state-of-the-art results on two public datasets.

[57]  arXiv:2110.10329 [pdf, other]
Title: SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of domains and languages. This paper takes the universality of unsupervised language pre-training one step further, by unifying speech and text pre-training within a single model. We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech. To further align our model representations across modalities, we leverage alignment losses, specifically Translation Language Modeling (TLM) and Speech Text Matching (STM) that make use of supervised speech-text recognition data. We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST~2 speech translation, by around 1 BLEU compared to single-modality pre-trained models, while retaining close to SotA performance on LibriSpeech and SpeechStew ASR tasks. On four GLUE tasks and text-normalization, we observe evidence of capacity limitations and interference between the two modalities, leading to degraded performance compared to an equivalent text-only model, while still being competitive with BERT. Through extensive empirical analysis we also demonstrate the importance of the choice of objective function for speech pre-training, and the beneficial effect of adding additional supervised signals on the quality of the learned representations.

[58]  arXiv:2110.10333 [pdf, other]
Title: Computationally Efficient Safe Reinforcement Learning for Power Systems
Comments: Submitted to the 2022 American Control Conference. 8 pages, 7 figures
Subjects: Systems and Control (eess.SY)

We propose a computationally efficient approach to safe reinforcement learning (RL) for frequency regulation in power systems with high levels of variable renewable energy resources. The approach draws on set-theoretic control techniques to craft a neural network-based control policy that is guaranteed to satisfy safety-critical state constraints, without needing to solve a model predictive control or projection problem in real time. By exploiting the properties of robust controlled-invariant polytopes, we construct a novel, closed-form "safety-filter" that enables end-to-end safe learning using any policy gradient-based RL algorithm. We then apply the safety filter in conjunction with the deep deterministic policy gradient (DDPG) algorithm to regulate frequency in a modified 9-bus power system, and show that the learned policy is more cost-effective than robust linear feedback control techniques while maintaining the same safety guarantee. We also show that the proposed paradigm outperforms DDPG augmented with constraint violation penalties.

[59]  arXiv:2110.10334 [pdf, other]
Title: Toward Accurate and Reliable Iris Segmentation Using Uncertainty Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

As an upstream task of iris recognition, iris segmentation plays a vital role in multiple subsequent tasks, including localization and matching. A slight bias in iris segmentation often results in obvious performance degradation of the iris recognition system. In the paper, we propose an Iris U-transformer (IrisUsformer) for accurate and reliable iris segmentation. For better accuracy, we elaborately design IrisUsformer by adopting position-sensitive operation and re-packaging transformer block to raise the spatial perception ability of the model. For better reliability, IrisUsformer utilizes an auxiliary head to distinguishes the high- and low-uncertainty regions of segmentation predictions and then adopts a weighting scheme to guide model optimization. Experimental results on three publicly available databases demonstrate that IrisUsformer achieves better segmentation accuracy using 35% MACs of the SOTA IrisParseNet. More importantly, our method estimates the uncertainty map corresponding to the segmentation prediction for subsequent processing in iris recognition systems.

[60]  arXiv:2110.10335 [pdf, other]
Title: Simpler Does It: Generating Semantic Labels with Objectness Guidance
Comments: BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing weakly or semi-supervised semantic segmentation methods utilize image or box-level supervision to generate pseudo-labels for weakly labeled images. However, due to the lack of strong supervision, the generated pseudo-labels are often noisy near the object boundaries, which severely impacts the network's ability to learn strong representations. To address this problem, we present a novel framework that generates pseudo-labels for training images, which are then used to train a segmentation model. To generate pseudo-labels, we combine information from: (i) a class agnostic objectness network that learns to recognize object-like regions, and (ii) either image-level or bounding box annotations. We show the efficacy of our approach by demonstrating how the objectness network can naturally be leveraged to generate object-like regions for unseen categories. We then propose an end-to-end multi-task learning strategy, that jointly learns to segment semantics and objectness using the generated pseudo-labels. Extensive experiments demonstrate the high quality of our generated pseudo-labels and effectiveness of the proposed framework in a variety of domains. Our approach achieves better or competitive performance compared to existing weakly-supervised and semi-supervised methods.

[61]  arXiv:2110.10340 [pdf, other]
Title: News-based Business Sentiment and its Properties as an Economic Index
Comments: 40 pages, to be published in Information Processing and Management
Subjects: Computation and Language (cs.CL)

This paper presents an approach to measuring business sentiment based on textual data. Business sentiment has been measured by traditional surveys, which are costly and time-consuming to conduct. To address the issues, we take advantage of daily newspaper articles and adopt a self-attention-based model to define a business sentiment index, named S-APIR, where outlier detection models are investigated to properly handle various genres of news articles. Moreover, we propose a simple approach to temporally analyzing how much any given event contributed to the predicted business sentiment index. To demonstrate the validity of the proposed approach, an extensive analysis is carried out on 12 years' worth of newspaper articles. The analysis shows that the S-APIR index is strongly and positively correlated with established survey-based index (up to correlation coefficient r=0.937) and that the outlier detection is effective especially for a general newspaper. Also, S-APIR is compared with a variety of economic indices, revealing the properties of S-APIR that it reflects the trend of the macroeconomy as well as the economic outlook and sentiment of economic agents. Moreover, to illustrate how S-APIR could benefit economists and policymakers, several events are analyzed with respect to their impacts on business sentiment over time.

[62]  arXiv:2110.10341 [pdf, other]
Title: Quadrotor Trajectory Tracking with Learned Dynamics: Joint Koopman-based Learning of System Models and Function Dictionaries
Comments: arXiv admin note: text overlap with arXiv:2105.08036
Subjects: Robotics (cs.RO)

Nonlinear dynamical effects are crucial to the operation of many agile robotic systems. Koopman-based model learning methods can capture these nonlinear dynamical system effects in higher dimensional lifted bilinear models that are amenable to optimal control. However, standard methods that lift the system state using a fixed function dictionary before model learning result in high dimensional models that are intractable for real time control. This paper presents a novel method that jointly learns a function dictionary and lifted bilinear model purely from data by incorporating the Koopman model in a neural network architecture. Nonlinear MPC design utilizing the learned model can be performed readily. We experimentally realized this method on a multirotor drone for agile trajectory tracking at low altitudes where the aerodynamic ground effect influences the system's behavior. Experimental results demonstrate that the learning-based controller achieves similar performance as a nonlinear MPC based on a nominal dynamics model in medium altitude. However, our learning-based system can reliably track trajectories in near-ground flight regimes while the nominal controller crashes due to unmodeled dynamical effects that are captured by our method.

[63]  arXiv:2110.10342 [pdf, ps, other]
Title: Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond
Comments: 72 pages
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing analyses of these methods assume independent and unbiased gradient estimates obtained via with-replacement sampling. In contrast, we study shuffling-based variants: minibatch and local Random Reshuffling, which draw stochastic gradients without replacement and are thus closer to practice. For smooth functions satisfying the Polyak-{\L}ojasiewicz condition, we obtain convergence bounds (in the large epoch regime) which show that these shuffling-based variants converge faster than their with-replacement counterparts. Moreover, we prove matching lower bounds showing that our convergence analysis is tight. Finally, we propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.

[64]  arXiv:2110.10343 [pdf, other]
Title: EBJR: Energy-Based Joint Reasoning for Adaptive Inference
Comments: BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. However, the excellent performance comes at a cost of inefficient computational cost. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the small fast ones. To this end, we propose an Energy-Based Joint Reasoning (EBJR) framework that adaptively distributes the samples between shallow and deep models to achieve an accuracy close to the deep model, but latency close to the shallow one. Our method is applicable to out-of-the-box pre-trained models as it does not require an architecture change nor re-training. Moreover, it is easy to use and deploy, especially for cloud services. Through a comprehensive set of experiments on different down-stream tasks, we show that our method outperforms strong state-of-the-art approaches with a considerable margin. In addition, we propose specialized EBJR, an extension of our method where we create a smaller specialized side model that performs the target task only partially, but yields an even higher accuracy and faster inference. We verify the strengths of our methods with both theoretical and experimental evaluations.

[65]  arXiv:2110.10347 [pdf, other]
Title: Multi-Stage Network Embedding for Exploring Heterogeneous Edges
Comments: ACM Transactions on Knowledge Discovery from Data
Subjects: Social and Information Networks (cs.SI)

The relationships between objects in a network are typically diverse and complex, leading to the heterogeneous edges with different semantic information. In this paper, we focus on exploring the heterogeneous edges for network representation learning. By considering each relationship as a view that depicts a specific type of proximity between nodes, we propose a multi-stage non-negative matrix factorization (MNMF) model, committed to utilizing abundant information in multiple views to learn robust network representations. In fact, most existing network embedding methods are closely related to implicitly factorizing the complex proximity matrix. However, the approximation error is usually quite large, since a single low-rank matrix is insufficient to capture the original information. Through a multi-stage matrix factorization process motivated by gradient boosting, our MNMF model achieves lower approximation error. Meanwhile, the multi-stage structure of MNMF gives the feasibility of designing two kinds of non-negative matrix factorization (NMF) manners to preserve network information better. The united NMF aims to preserve the consensus information between different views, and the independent NMF aims to preserve unique information of each view. Concrete experimental results on realistic datasets indicate that our model outperforms three types of baselines in practical applications.

[66]  arXiv:2110.10348 [pdf, other]
Title: GTM: Gray Temporal Model for Video Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Data input modality plays an important role in video action recognition. Normally, there are three types of input: RGB, flow stream and compressed data. In this paper, we proposed a new input modality: gray stream. Specifically, taken the stacked consecutive 3 gray images as input, which is the same size of RGB, can not only skip the conversion process from video decoding data to RGB, but also improve the spatio-temporal modeling ability at zero computation and zero parameters. Meanwhile, we proposed a 1D Identity Channel-wise Spatio-temporal Convolution(1D-ICSC) which captures the temporal relationship at channel-feature level within a controllable computation budget(by parameters G & R). Finally, we confirm its effectiveness and efficiency on several action recognition benchmarks, such as Kinetics, Something-Something, HMDB-51 and UCF-101, and achieve impressive results.

[67]  arXiv:2110.10349 [pdf, ps, other]
Title: Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge Caching
Comments: 12 pages, 6 figures, under review with the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Multimedia (cs.MM)

Mobile edge computing (MEC) is a prominent computing paradigm which expands the application fields of wireless communication. Due to the limitation of the capacities of user equipments and MEC servers, edge caching (EC) optimization is crucial to the effective utilization of the caching resources in MEC-enabled wireless networks. However, the dynamics and complexities of content popularities over space and time as well as the privacy preservation of users pose significant challenges to EC optimization. In this paper, a privacy-preserving distributed deep deterministic policy gradient (P2D3PG) algorithm is proposed to maximize the cache hit rates of devices in the MEC networks. Specifically, we consider the fact that content popularities are dynamic, complicated and unobservable, and formulate the maximization of cache hit rates on devices as distributed problems under the constraints of privacy preservation. In particular, we convert the distributed optimizations into distributed model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction. Subsequently, a P2D3PG algorithm is developed based on distributed reinforcement learning to solve the distributed problems. Simulation results demonstrate the superiority of the proposed approach in improving EC hit rate over the baseline methods while preserving user privacy.

[68]  arXiv:2110.10353 [pdf, other]
Title: Contextual Gradient Scaling for Few-Shot Learning
Comments: Accepted to WACV2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Model-agnostic meta-learning (MAML) is a well-known optimization-based meta-learning algorithm that works well in various computer vision tasks, e.g., few-shot classification. MAML is to learn an initialization so that a model can adapt to a new task in a few steps. However, since the gradient norm of a classifier (head) is much bigger than those of backbone layers, the model focuses on learning the decision boundary of the classifier with similar representations. Furthermore, gradient norms of high-level layers are small than those of the other layers. So, the backbone of MAML usually learns task-generic features, which results in deteriorated adaptation performance in the inner-loop. To resolve or mitigate this problem, we propose contextual gradient scaling (CxGrad), which scales gradient norms of the backbone to facilitate learning task-specific knowledge in the inner-loop. Since the scaling factors are generated from task-conditioned parameters, gradient norms of the backbone can be scaled in a task-wise fashion. Experimental results show that CxGrad effectively encourages the backbone to learn task-specific knowledge in the inner-loop and improves the performance of MAML up to a significant margin in both same- and cross-domain few-shot classification.

[69]  arXiv:2110.10354 [pdf, other]
Title: Detecting Backdoor Attacks Against Point Cloud Classifiers
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Backdoor attacks (BA) are an emerging threat to deep neural network classifiers. A classifier being attacked will predict to the attacker's target class when a test sample from a source class is embedded with the backdoor pattern (BP). Recently, the first BA against point cloud (PC) classifiers was proposed, creating new threats to many important applications including autonomous driving. Such PC BAs are not detectable by existing BA defenses due to their special BP embedding mechanism. In this paper, we propose a reverse-engineering defense that infers whether a PC classifier is backdoor attacked, without access to its training set or to any clean classifiers for reference. The effectiveness of our defense is demonstrated on the benchmark ModeNet40 dataset for PCs.

[70]  arXiv:2110.10355 [pdf, other]
Title: Dynamic Multi-Person Mesh Recovery From Uncalibrated Multi-View Cameras
Comments: 3DV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Dynamic multi-person mesh recovery has been a hot topic in 3D vision recently. However, few works focus on the multi-person motion capture from uncalibrated cameras, which mainly faces two challenges: the one is that inter-person interactions and occlusions introduce inherent ambiguities for both camera calibration and motion capture; The other is that a lack of dense correspondences can be used to constrain sparse camera geometries in a dynamic multi-person scene. Our key idea is incorporating motion prior knowledge into simultaneous optimization of extrinsic camera parameters and human meshes from noisy human semantics. First, we introduce a physics-geometry consistency to reduce the low and high frequency noises of the detected human semantics. Then a novel latent motion prior is proposed to simultaneously optimize extrinsic camera parameters and coherent human motions from slightly noisy inputs. Experimental results show that accurate camera parameters and human motions can be obtained through one-stage optimization. The codes will be publicly available at~\url{https://www.yangangwang.com}.

[71]  arXiv:2110.10357 [pdf, other]
Title: Fast Bitmap Fit: A CPU Cache Line friendly memory allocator for single object allocations
Subjects: Data Structures and Algorithms (cs.DS); Operating Systems (cs.OS); Performance (cs.PF)

Applications making excessive use of single-object based data structures (such as linked lists, trees, etc...) can see a drop in efficiency over a period of time due to the randomization of nodes in memory. This slow down is due to the ineffective use of the CPU's L1/L2 cache. We present a novel approach for mitigating this by presenting the design of a single-object memory allocator that preserves memory locality across randomly ordered memory allocations and deallocations.

[72]  arXiv:2110.10358 [pdf, other]
Title: Hierarchical Aspect-guided Explanation Generation for Explainable Recommendation
Subjects: Computation and Language (cs.CL)

Explainable recommendation systems provide explanations for recommendation results to improve their transparency and persuasiveness. The existing explainable recommendation methods generate textual explanations without explicitly considering the user's preferences on different aspects of the item. In this paper, we propose a novel explanation generation framework, named Hierarchical Aspect-guided explanation Generation (HAG), for explainable recommendation. Specifically, HAG employs a review-based syntax graph to provide a unified view of the user/item details. An aspect-guided graph pooling operator is proposed to extract the aspect-relevant information from the review-based syntax graphs to model the user's preferences on an item at the aspect level. Then, a hierarchical explanation decoder is developed to generate aspects and aspect-relevant explanations based on the attention mechanism. The experimental results on three real datasets indicate that HAG outperforms state-of-the-art explanation generation methods in both single-aspect and multi-aspect explanation generation tasks, and also achieves comparable or even better preference prediction accuracy than strong baseline methods.

[73]  arXiv:2110.10360 [pdf, other]
Title: Real-time Identification and Simultaneous Avoidance of Static and Dynamic Obstacles on Point Cloud for UAVs Navigation
Authors: Han Chen, Peng Lu
Comments: 12 pages. arXiv admin note: text overlap with arXiv:2105.06622
Subjects: Robotics (cs.RO)

Avoiding hybrid obstacles in unknown scenarios with an efficient flight strategy is a key challenge for unmanned aerial vehicle applications. In this paper, we introduce a more robust technique to distinguish and track dynamic obstacles from static ones with only point cloud input. Then, to achieve dynamic avoidance, we propose the forbidden pyramids method to solve the desired vehicle velocity with an efficient sampling-based method in iteration. The motion primitives are generated by solving a nonlinear optimization problem with the constraint of desired velocity and the waypoint. Furthermore, we present several techniques to deal with the position estimation error for close objects, the error for deformable objects, and the time gap between different submodules. The proposed approach is implemented to run onboard in real-time and validated extensively in simulation and hardware tests, demonstrating our superiority in tracking robustness, energy cost, and calculating time.

[74]  arXiv:2110.10364 [pdf, other]
Title: NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset
Comments: 13 pages, 6 figures, to be published in BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent work indicates that, besides being a challenge in producing perceptually pleasing images, low light proves more difficult for machine cognition than previously thought. In our work, we take a closer look at object detection in low light. First, to support the development and evaluation of new methods in this domain, we present a high-quality large-scale Night Object Detection (NOD) dataset showing dynamic scenes captured on the streets at night. Next, we directly link the lighting conditions to perceptual difficulty and identify what makes low light problematic for machine cognition. Accordingly, we provide instance-level annotation for a subset of the dataset for an in-depth evaluation of future methods. We also present an analysis of the baseline model performance to highlight opportunities for future research and show that low light is a non-trivial problem that requires special attention from the researchers. Further, to address the issues caused by low light, we propose to incorporate an image enhancement module into the object detection framework and two novel data augmentation techniques. Our image enhancement module is trained under the guidance of the object detector to learn image representation optimal for machine cognition rather than for the human visual system. Finally, experimental results confirm that the proposed method shows consistent improvement of the performance on low-light datasets.

[75]  arXiv:2110.10366 [pdf, other]
Title: Repaint: Improving the Generalization of Down-Stream Visual Tasks by Generating Multiple Instances of Training Examples
Comments: BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Convolutional Neural Networks (CNNs) for visual tasks are believed to learn both the low-level textures and high-level object attributes, throughout the network depth. This paper further investigates the `texture bias' in CNNs. To this end, we regenerate multiple instances of training examples from each original image, through a process we call `repainting'. The repainted examples preserve the shape and structure of the regions and objects within the scenes, but diversify their texture and color. Our method can regenerate a same image at different daylight, season, or weather conditions, can have colorization or de-colorization effects, or even bring back some texture information from blacked-out areas. The in-place repaint allows us to further use these repainted examples for improving the generalization of CNNs. Through an extensive set of experiments, we demonstrate the usefulness of the repainted examples in training, for the tasks of image classification (ImageNet) and object detection (COCO), over several state-of-the-art network architectures at different capacities, and across different data availability regimes.

[76]  arXiv:2110.10367 [pdf, ps, other]
Title: Constructions and Applications of Perfect Difference Matrices and Perfect Difference Families
Subjects: Information Theory (cs.IT)

Perfect difference families (PDFs for short) are important both in theoretical and in applications. Perfect difference matrices (PDMs for short) and the equivalent structure had been extensively studied and used to construct perfect difference families, radar array and related codes. The necessary condition for the existence of a PDM$(n,m)$ is $m\equiv 1\pmod2$ and $m\geq n+1$. So far, PDM$(3,m)$s exist for odd $5\leq m\leq 201$ with two definite exceptions of $m=9,11$. In this paper, new recursive constructions on PDM$(3,m)$s are investigated, and it is proved that there exist PDM$(3,m)$s for any odd $5\leq m<1000$ with two definite exceptions of $m=9,11$ and $33$ possible exceptions. A complete result of $(g,\{3,4\},1)$-PDFs with the ratio of block size $4$ no less than $\frac{1}{14}$ is obtained. As an application, a complete class of perfect strict optical orthogonal codes with weights $3$ and $4$ is obtained.

[77]  arXiv:2110.10368 [pdf, other]
Title: ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Existing semi-supervised learning (SSL) algorithms typically assume class-balanced datasets, although the class distributions of many real-world datasets are imbalanced. In general, classifiers trained on a class-imbalanced dataset are biased toward the majority classes. This issue becomes more problematic for SSL algorithms because they utilize the biased prediction of unlabeled data for training. However, traditional class-imbalanced learning techniques, which are designed for labeled data, cannot be readily combined with SSL algorithms. We propose a scalable class-imbalanced SSL algorithm that can effectively use unlabeled data, while mitigating class imbalance by introducing an auxiliary balanced classifier (ABC) of a single layer, which is attached to a representation layer of an existing SSL algorithm. The ABC is trained with a class-balanced loss of a minibatch, while using high-quality representations learned from all data points in the minibatch using the backbone SSL algorithm to avoid overfitting and information loss.Moreover, we use consistency regularization, a recent SSL technique for utilizing unlabeled data in a modified way, to train the ABC to be balanced among the classes by selecting unlabeled data with the same probability for each class. The proposed algorithm achieves state-of-the-art performance in various class-imbalanced SSL experiments using four benchmark datasets.

[78]  arXiv:2110.10369 [pdf, other]
Title: Model Composition: Can Multiple Neural Networks Be Combined into a Single Network Using Only Unlabeled Data?
Comments: BMVC 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The diversity of deep learning applications, datasets, and neural network architectures necessitates a careful selection of the architecture and data that match best to a target application. As an attempt to mitigate this dilemma, this paper investigates the idea of combining multiple trained neural networks using unlabeled data. In addition, combining multiple models into one can speed up the inference, result in stronger, more capable models, and allows us to select efficient device-friendly target network architectures. To this end, the proposed method makes use of generation, filtering, and aggregation of reliable pseudo-labels collected from unlabeled data. Our method supports using an arbitrary number of input models with arbitrary architectures and categories. Extensive performance evaluations demonstrated that our method is very effective. For example, for the task of object detection and without using any ground-truth labels, an EfficientDet-D0 trained on Pascal-VOC and an EfficientDet-D1 trained on COCO, can be combined to a RetinaNet-ResNet50 model, with a similar mAP as the supervised training. If fine-tuned in a semi-supervised setting, the combined model achieves +18.6%, +12.6%, and +8.1% mAP improvements over supervised training with 1%, 5%, and 10% of labels.

[79]  arXiv:2110.10372 [pdf, other]
Title: Distributionally Robust Classifiers in Sentiment Analysis
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this paper, we propose sentiment classification models based on BERT integrated with DRO (Distributionally Robust Classifiers) to improve model performance on datasets with distributional shifts. We added 2-Layer Bi-LSTM, projection layer (onto simplex or Lp ball), and linear layer on top of BERT to achieve distributionally robustness. We considered one form of distributional shift (from IMDb dataset to Rotten Tomatoes dataset). We have confirmed through experiments that our DRO model does improve performance on our test set with distributional shift from the training set.

[80]  arXiv:2110.10374 [pdf, other]
Title: Playing 2048 With Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI)

The game of 2048 is a highly addictive game. It is easy to learn the game, but hard to master as the created game revealed that only about 1% games out of hundreds million ever played have been won. In this paper, we would like to explore reinforcement learning techniques to win 2048. The approaches we have took include deep Q-learning and beam search, with beam search reaching 2048 28.5 of time.

[81]  arXiv:2110.10376 [pdf, other]
Title: A Fast Planning Approach for 3D Short Trajectory with a Parallel Framework
Comments: 16 pages
Subjects: Robotics (cs.RO)

For real applications of unmanned aerial vehicles, the capability of navigating with full autonomy in unknown environments is a crucial requirement. However, planning a shorter path with less computing time is contradictory. To address this problem, we present a framework with the map planner and point cloud planner running in parallel in this paper. The map planner determines the initial path using the improved jump point search method on the 2D map, and then it tries to optimize the path by considering a possible shorter 3D path. The point cloud planner is executed at a high frequency to generate the motion primitives. It makes the drone follow the solved path and avoid the suddenly appearing obstacles nearby. Thus, vehicles can achieve a short trajectory while reacting quickly to the intruding obstacles. We demonstrate fully autonomous quadrotor flight tests in unknown and complex environments with static and dynamic obstacles to validate the proposed method. In simulation and hardware experiments, the proposed framework shows satisfactorily comprehensive performance.

[82]  arXiv:2110.10379 [pdf, other]
Title: Cascaded Compressed Sensing Networks: A Reversible Architecture for Layerwise Learning
Subjects: Machine Learning (cs.LG)

Recently, the method that learns networks layer by layer has attracted increasing interest for its ease of analysis. For the method, the main challenge lies in deriving an optimization target for each layer by inversely propagating the global target of the network. The propagation problem is ill posed, due to involving the inversion of nonlinear activations from lowdimensional to high-dimensional spaces. To address the problem, the existing solution is to learn an auxiliary network to specially propagate the target. However, the network lacks stability, and moreover, it results in higher complexity for network learning. In the letter, we show that target propagation could be achieved by modeling the network s each layer with compressed sensing, without the need of auxiliary networks. Experiments show that the proposed method could achieve better performance than the auxiliary network-based method.

[83]  arXiv:2110.10380 [pdf, ps, other]
Title: Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic Forecasting
Comments: 12 pages, Submitted as conference paper to ICLR 2022
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Traffic forecasting is a challenging problem due to complex road networks and sudden speed changes caused by various events on roads. A number of models have been proposed to solve this challenging problem with a focus on learning spatio-temporal dependencies of roads. In this work, we propose a new perspective of converting the forecasting problem into a pattern matching task, assuming that large data can be represented by a set of patterns. To evaluate the validness of the new perspective, we design a novel traffic forecasting model, called Pattern-Matching Memory Networks (PM-MemNet), which learns to match input data to the representative patterns with a key-value memory structure. We first extract and cluster representative traffic patterns, which serve as keys in the memory. Then via matching the extracted keys and inputs, PM-MemNet acquires necessary information of existing traffic patterns from the memory and uses it for forecasting. To model spatio-temporal correlation of traffic, we proposed novel memory architecture GCMem, which integrates attention and graph convolution for memory enhancement. The experiment results indicate that PM-MemNet is more accurate than state-of-the-art models, such as Graph WaveNet with higher responsiveness. We also present a qualitative analysis result, describing how PM-MemNet works and achieves its higher accuracy when road speed rapidly changes.

[84]  arXiv:2110.10389 [pdf, other]
Title: Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To Reduce Model Bias
Comments: A variant of this report is accepted in WACV 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Contextual information is a valuable cue for Deep Neural Networks (DNNs) to learn better representations and improve accuracy. However, co-occurrence bias in the training dataset may hamper a DNN model's generalizability to unseen scenarios in the real world. For example, in COCO, many object categories have a much higher co-occurrence with men compared to women, which can bias a DNN's prediction in favor of men. Recent works have focused on task-specific training strategies to handle bias in such scenarios, but fixing the available data is often ignored. In this paper, we propose a novel and more generic solution to address the contextual bias in the datasets by selecting a subset of the samples, which is fair in terms of the co-occurrence with various classes for a protected attribute. We introduce a data repair algorithm using the coefficient of variation, which can curate fair and contextually balanced data for a protected class(es). This helps in training a fair model irrespective of the task, architecture or training methodology. Our proposed solution is simple, effective, and can even be used in an active learning setting where the data labels are not present or being generated incrementally. We demonstrate the effectiveness of our algorithm for the task of object detection and multi-label image classification across different datasets. Through a series of experiments, we validate that curating contextually fair data helps make model predictions fair by balancing the true positive rate for the protected class across groups without compromising on the model's overall performance.

[85]  arXiv:2110.10391 [pdf, other]
Title: Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for sparse recover
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

In this paper, we consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications. Specifically, we treat sensing problems with model mismatch where one wishes to recover a sparse high-dimensional vector from low-dimensional observations subject to uncertainty in the measurement operator. We then design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem. Our proposed network - named Robust lEarned Shrinkage-Thresholding (REST) - exhibits an additional normalization processing compared to Learned Iterative Shrinkage-Thresholding Algorithm (LISTA), leading to reliable recovery of the signal under sample-wise varying model mismatch. The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems wherein model mismatch is taken into consideration.

[86]  arXiv:2110.10395 [pdf, other]
Title: 3DFaceFill: An Analysis-By-Synthesis Approach to Face Completion
Comments: Winter Conference on Applications of Computer Vision, WACV 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing face completion solutions are primarily driven by end-to-end models that directly generate 2D completions of 2D masked faces. By having to implicitly account for geometric and photometric variations in facial shape and appearance, such approaches result in unrealistic completions, especially under large variations in pose, shape, illumination and mask sizes. To alleviate these limitations, we introduce 3DFaceFill, an analysis-by-synthesis approach for face completion that explicitly considers the image formation process. It comprises three components, (1) an encoder that disentangles the face into its constituent 3D mesh, 3D pose, illumination and albedo factors, (2) an autoencoder that inpaints the UV representation of facial albedo, and (3) a renderer that resynthesizes the completed face. By operating on the UV representation, 3DFaceFill affords the power of correspondence and allows us to naturally enforce geometrical priors (e.g. facial symmetry) more effectively. Quantitatively, 3DFaceFill improves the state-of-the-art by up to 4dB higher PSNR and 25% better LPIPS for large masks. And, qualitatively, it leads to demonstrably more photorealistic face completions over a range of masks and occlusions while preserving consistency in global and component-wise shape, pose, illumination and eye-gaze.

[87]  arXiv:2110.10396 [pdf, other]
Title: UPPRESSO: Untraceable and Unlinkable Privacy-PREserving Single Sign-On Services
Authors: Chengqian Guo (1, 2 and 4), Jingqiang Lin (3), Quanwei Cai (1, 2 and 4), Fengjun Li (5), Qiongxiao Wang (1, 2 and 4), Jiwu Jing (4), Bin Zhao (6), Wei Wang (1, 2 and 4) ((1) State Key Laboratory of Information Security, Institute of Information Engineering, CAS, (2) Data Assurance and Communication Security Research Center, CAS, (3) School of Cyber Security, University of Science and Technology of China, (4) School of Cyber Security, University of Chinese Academy of Sciences (5) University of Kansas, (6) JD.com Silicon Valley R&D Center)
Subjects: Cryptography and Security (cs.CR)

Single sign-on (SSO) allows a user to maintain only the credential at the identity provider (IdP), instead of one credential for each relying party (RP), to login to numerous RPs. However, SSO introduces extra privacy leakage threats, as (a) the IdP could track all the RPs which a user is visiting, and (b) collusive RPs could learn a user's online profile by linking his identities across these RPs. Several privacy-preserving SSO solutions have been proposed to defend against either the curious IdP or collusive RPs, but none of them addresses both of these privacy leakage threats at the same time. In this paper, we propose a privacy-preserving SSO system, called UPPRESSO, to protect a user's login traces against both the curious IdP and collusive RPs simultaneously. We analyze the identity dilemma between the SSO security requirements and these privacy concerns, and convert the SSO privacy problems into an identity-transformation challenge. To the best of our knowledge, this is the first practical SSO solution which solves the privacy problems caused by both the curious IdP and collusive RPs. We build the UPPRESSO prototype system for web applications, with standard functions of OpenID Connect, while the function of Core Sign-On is slightly modified to calculate the transformed identities. The prototype system is implemented on top of open-source MITREid Connect, and the extensive evaluation shows that UPPRESSO introduces reasonable overheads and fulfills the requirements of both security and privacy.

[88]  arXiv:2110.10401 [pdf, other]
Title: Monitoring Collective Communication Among GPUs
Comments: 12 pages, 3 figures, 3 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Communication among devices in multi-GPU systems plays an important role in terms of performance and scalability. In order to optimize an application, programmers need to know the type and amount of the communication happening among GPUs. Although there are prior works to gather this information in MPI applications on distributed systems and multi-threaded applications on shared memory systems, there is no tool that identifies communication among GPUs. Our prior work, ComScribe, presents a point-to-point (P2P) communication detection tool for GPUs sharing a common host. In this work, we extend ComScribe to identify communication among GPUs for collective and P2P communication primitives in NVIDIA's NCCL library. In addition to P2P communications, collective communications are commonly used in HPC and AI workloads thus it is important to monitor the induced data movement due to collectives. Our tool extracts the size and the frequency of data transfers in an application and visualizes them as a communication matrix. To demonstrate the tool in action, we present communication matrices and some statistics for two applications coming from machine translation and image classification domains.

[89]  arXiv:2110.10402 [pdf, other]
Title: An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR
Comments: Accepted to APSIPA 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency. The triggered attention mechanism, which performs autoregressive decoding triggered by the CTC spike, has shown to be effective in streaming ASR. However, in order to maintain high accuracy of alignment estimation based on CTC outputs, which is the key to its performance, it is inevitable that decoding should be performed with some future information input (i.e., with higher latency). It should be noted that in streaming ASR, it is desirable to be able to achieve high recognition accuracy while keeping the latency low. Therefore, the present study aims to achieve highly accurate streaming ASR with low latency by introducing Mask-CTC, which is capable of learning feature representations that anticipate future information (i.e., that can consider long-term contexts), to the encoder pre-training. Experimental comparisons conducted using WSJ data demonstrate that the proposed method achieves higher accuracy with lower latency than the conventional triggered attention-based streaming ASR system.

[90]  arXiv:2110.10404 [pdf, other]
Title: JavaBERT: Training a transformer-based model for the Java programming language
Comments: 6 pages, to appear in the Proceedings of the 9th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE'2021)
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)

Code quality is and will be a crucial factor while developing new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing out the potential benefits of its application. Natural language processing has shown the potential to process text data regarding a variety of tasks. We argue, that such models can also show similar benefits for software code processing. In this paper, we investigate how models used for natural language processing can be trained upon software code. We introduce a data retrieval pipeline for software code and train a model upon Java software code. The resulting model, JavaBERT, shows a high accuracy on the masked language modeling task showing its potential for software engineering tools.

[91]  arXiv:2110.10405 [pdf, other]
Title: ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent approaches for end-to-end text spotting have achieved promising results. However, most of the current spotters were plagued by the inconsistency problem between text detection and recognition. In this work, we introduce and prove the existence of the inconsistency problem and analyze it from two aspects: (1) inconsistency of text recognition features between training and testing, and (2) inconsistency of optimization targets between text detection and recognition. To solve the aforementioned issues, we propose a differentiable Auto-Rectification Module (ARM) together with a new training strategy to enable propagating recognition loss back into detection branch, so that our detection branch can be jointly optimized by detection and recognition targets, which largely alleviates the inconsistency problem between text detection and recognition. Based on these designs, we present a simple yet robust end-to-end text spotting framework, termed Auto-Rectification Text Spotter (ARTS), to detect and recognize arbitrarily-shaped text in natural scenes. Extensive experiments demonstrate the superiority of our method. In particular, our ARTS-S achieves 77.1% end-to-end text spotting F-measure on Total-Text at a competitive speed of 10.5 FPS, which significantly outperforms previous methods in both accuracy and inference speed.

[92]  arXiv:2110.10407 [pdf, other]
Title: Development of an Ontology for an Integrated Image Analysis Platform to enable Global Sharing of Microscopy Imaging Data
Subjects: Digital Libraries (cs.DL); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)

Imaging data is one of the most important fundamentals in the current life sciences. We aimed to construct an ontology to describe imaging metadata as a data schema of the integrated database for optical and electron microscopy images combined with various bio-entities. To realise this, we applied Resource Description Framework (RDF) to an Open Microscopy Environment (OME) data model, which is the de facto standard to describe optical microscopy images and experimental data. We translated the XML-based OME metadata into the base concept of RDF schema as a trial of developing microscopy ontology. In this ontology, we propose 18 upper-level concepts including missing concepts in OME such as electron microscopy, phenotype data, biosample, and imaging conditions.

[93]  arXiv:2110.10412 [pdf, other]
Title: A Row-Wise Update Algorithm for Sparse Stochastic Matrix Factorization
Comments: 27 pages,6 figures
Subjects: Numerical Analysis (math.NA)

Nonnegative matrix factorization arises widely in machine learning and data analysis. In this paper, for a given factorization of rank r, we consider the sparse stochastic matrix factorization (SSMF) of decomposing a prescribed m-by-n stochastic matrix V into a product of an m-by-r stochastic matrix W and a sparse r-by-n stochastic matrix H. With the prescribed sparsity level, we reformulate the SSMF as an unconstrained nonvonvex-nonsmooth minimization problem and introduce a row-wise update algorithm for solving the minimization problem. We show that the proposed algorithm converges globally and the generated sequence converges to a special critical point of the cost function, which is a global minimizer over the W-factor as a whole and is nearly a global minimizer over each row vector of the H-factor. Numerical experiments on both synthetic and real data sets are given to demonstrate the effectiveness of our proposed algorithm.

[94]  arXiv:2110.10413 [pdf, other]
Title: Newtonian Mechanics Based Transient Stability PART I: Machine Paradigms
Comments: This paper contains 15 pages and 25 figures
Subjects: Systems and Control (eess.SY)

Individual-machine, superimposed-machine and equivalent-machine can be seen as the three major perspectives of the power system transient stability. In this paper, the machine paradigms are established according to the common thinking among the three different machines. The machine paradigms comprise of the three components, i.e., trajectory paradigm, modeling paradigm and energy paradigm. The trajectory paradigm is the reflection of the trajectory stability; the modeling paradigm is the two-machine-system modeling of the trajectory stability; and the energy paradigm is the stability evaluation of the two-machine system. Based on this, it is clarified that the machine paradigms can be expressed into the individual machine form or the equivalent machine form. Then, the relationship between the machine stability and the system stability are analyzed. Simulation results show that the effectiveness of both the individual-machine and the equivalent machine is fully based on the strict followings of the machine paradigms.

[95]  arXiv:2110.10415 [pdf, other]
Title: Depth360: Monocular Depth Estimation using Learnable Axisymmetric Camera Model for Spherical Camera Image
Comments: 8 pages, 6 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Self-supervised monocular depth estimation has been widely investigated to estimate depth images and relative poses from RGB images. This framework is attractive for researchers because the depth and pose networks can be trained from just time sequence images without the need for the ground truth depth and poses.
In this work, we estimate the depth around a robot (360 degree view) using time sequence spherical camera images, from a camera whose parameters are unknown. We propose a learnable axisymmetric camera model which accepts distorted spherical camera images with two fisheye camera images. In addition, we trained our models with a photo-realistic simulator to generate ground truth depth images to provide supervision. Moreover, we introduced loss functions to provide floor constraints to reduce artifacts that can result from reflective floor surfaces. We demonstrate the efficacy of our method using the spherical camera images from the GO Stanford dataset and pinhole camera images from the KITTI dataset to compare our method's performance with that of baseline method in learning the camera parameters.

[96]  arXiv:2110.10417 [pdf, other]
Title: FoV Privacy-aware VR Streaming
Comments: 6 pages, 4 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:2104.14170, arXiv:2104.09779
Subjects: Multimedia (cs.MM)

Proactive tile-based virtual reality (VR) video streaming can use the trace of FoV and eye movement to predict future requested tiles, then renders and delivers the predicted tiles before playback. The quality of experience (QoE) depends on the combined effect of tile prediction and consumed resources. Recently, it has been found that with the FoV and eye movement data collected for a user, one can infer the identity and preference of the user. Existing works investigate the privacy protection for eye movement, but never address how to protect the privacy in terms of FoV and how the privacy protection affects the QoE. In this paper, we strive to characterize and satisfy the FoV privacy requirement. We consider "trading resources for privacy". We first add camouflaged tile requests around the real FoV and define spatial degree of privacy (SDoP) as a normalized number of camouflaged tile requests. By consuming more resources to ensure SDoP, the real FoVs can be hidden. Then, we proceed to analyze the impacts of SDoP on the QoE by jointly optimizing the durations for prediction, computing, and transmission that maximizes the QoE given arbitrary predictor, configured resources, and SDoP. We find that a larger SDoP requires more resources but degrades the performance of tile prediction. Simulation with state-of-the-art predictors on a real dataset verifies the analysis and shows that a user requiring a larger SDoP can be served with better QoE.

[97]  arXiv:2110.10418 [pdf, other]
Title: Steganography of Complex Networks
Authors: Daewon Lee
Comments: 16 pages, 6 figures, 3 tables
Subjects: Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)

Steganography is one of the information hiding techniques, which conceals secret messages in cover media. Digital image and audio are the most studied cover media for steganography. However, so far, there is no research on steganography to utilize complex networks as cover media. To investigate the possibility and feasibility of complex networks as cover media for steganography, we introduce steganography of complex networks through three algorithms: BIND, BYMOND, and BYNIS. BIND hides two bits of a secret message in an edge, while BYMOND encodes a byte in an edge, without changing the original network structures. Encoding simulation experiments for the networks of Open Graph Benchmark demonstrated BIND and BYMOND can successfully hide random messages in the edge lists. BYNIS synthesizes edges by generating node identifiers from a given message. The degree distribution of stego network synthesized by BYNIS was mostly close to a power-law. Steganography of complex networks is expected to have applications such as watermarking to protect proprietary datasets, or sensitive information hiding for privacy preservation.

[98]  arXiv:2110.10422 [pdf, other]
Title: Encoding spatiotemporal priors with VAEs for small-area estimation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Gaussian processes (GPs), implemented through multivariate Gaussian distributions for a finite collection of data, are the most popular approach in small-area spatiotemporal statistical modelling. In this context they are used to encode correlation structures over space and time and can generalise well in interpolation tasks. Despite their flexibility, off-the-shelf GPs present serious computational challenges which limit their scalability and practical usefulness in applied settings. Here, we propose a novel, deep generative modelling approach to tackle this challenge: for a particular spatiotemporal setting, we approximate a class of GP priors through prior sampling and subsequent fitting of a variational autoencoder (VAE). Given a trained VAE, the resultant decoder allows spatiotemporal inference to become incredibly efficient due to the low dimensional, independently distributed latent Gaussian space representation of the VAE. Once trained, inference using the VAE decoder replaces the GP within a Bayesian sampling framework. This approach provides tractable and easy-to-implement means of approximately encoding spatiotemporal priors and facilitates efficient statistical inference. We demonstrate the utility of our VAE two stage approach on Bayesian, small-area estimation tasks.

[99]  arXiv:2110.10423 [pdf, other]
Title: ProxyBO: Accelerating Neural Architecture Search via Bayesian Optimization with Zero-cost Proxies
Subjects: Machine Learning (cs.LG)

Designing neural architectures requires immense manual efforts. This has promoted the development of neural architecture search (NAS) to automate this design. While previous NAS methods achieve promising results but run slowly and zero-cost proxies run extremely fast but are less promising, recent work considers utilizing zero-cost proxies via a simple warm-up. The existing method has two limitations, which are unforeseeable reliability and one-shot usage. To address the limitations, we present ProxyBO, an efficient Bayesian optimization framework that utilizes the zero-cost proxies to accelerate neural architecture search. We propose the generalization ability measurement to estimate the fitness of proxies on the task during each iteration and then combine BO with zero-cost proxies via dynamic influence combination. Extensive empirical studies show that ProxyBO consistently outperforms competitive baselines on five tasks from three public benchmarks. Concretely, ProxyBO achieves up to 5.41x and 3.83x speedups over the state-of-the-art approach REA and BRP-NAS, respectively.

[100]  arXiv:2110.10425 [pdf, other]
Title: A generalised phase field model for fatigue crack growth in elastic-plastic solids with an efficient monolithic solver
Journal-ref: Computer Methods in Applied Mechanics and Engineering (2022)
Subjects: Computational Engineering, Finance, and Science (cs.CE)

We present a generalised phase field-based formulation for predicting fatigue crack growth in metals. The theoretical framework aims at covering a wide range of material behaviour. Different fatigue degradation functions are considered and their influence is benchmarked against experiments. The phase field constitutive theory accommodates the so-called AT1, AT2 and phase field-cohesive zone (PF-CZM) models. In regards to material deformation, both non-linear kinematic and isotropic hardening are considered, as well as the combination of the two. Moreover, a monolithic solution scheme based on quasi-Newton algorithms is presented and shown to significantly outperform staggered approaches. The potential of the computational framework is demonstrated by investigating several 2D and 3D boundary value problems of particular interest. Constitutive and numerical choices are compared and insight is gained into their differences and similarities. The framework enables predicting fatigue crack growth in arbitrary geometries and for materials exhibiting complex (cyclic) deformation and damage responses. The finite element code developed is made freely available at www.empaneda.com/codes.

[101]  arXiv:2110.10428 [pdf, other]
Title: Reconstruction of Fragmented Trajectories of Collective Motion using Hadamard Deep Autoencoders
Comments: 21 Pages, 5 figures, submitted into Pattern Recognition
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Learning dynamics of collectively moving agents such as fish or humans is an active field in research. Due to natural phenomena such as occlusion and change of illumination, the multi-object methods tracking such dynamics might lose track of the agents where that might result fragmentation in the constructed trajectories. Here, we present an extended deep autoencoder (DA) that we train only on fully observed segments of the trajectories by defining its loss function as the Hadamard product of a binary indicator matrix with the absolute difference between the outputs and the labels. The trajectories of the agents practicing collective motion is low-rank due to mutual interactions and dependencies between the agents that we utilize as the underlying pattern that our Hadamard deep autoencoder (HDA) codes during its training. The performance of our HDA is compared with that of a low-rank matrix completion scheme in the context of fragmented trajectory reconstruction.

[102]  arXiv:2110.10429 [pdf, other]
Title: Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
Comments: 4page + 1page for citation + 2 pages for appendix
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a hierarchical distillation method using LMs trained in different units (senones, monophones, and subwords) and reveal the effectiveness of the hierarchical distillation method through an ablation study.

[103]  arXiv:2110.10431 [pdf, other]
Title: Discontinuous Grammar as a Foreign Language
Comments: 22 pages
Subjects: Computation and Language (cs.CL)

In order to achieve deep natural language understanding, syntactic constituent parsing is a vital step, highly demanded by many artificial intelligence systems to process both text and speech. One of the most recent proposals is the use of standard sequence-to-sequence models to perform constituent parsing as a machine translation task, instead of applying task-specific parsers. While they show a competitive performance, these text-to-parse transducers are still lagging behind classic techniques in terms of accuracy, coverage and speed. To close the gap, we here extend the framework of sequence-to-sequence models for constituent parsing, not only by providing a more powerful neural architecture for improving their performance, but also by enlarging their coverage to handle the most complex syntactic phenomena: discontinuous structures. To that end, we design several novel linearizations that can fully produce discontinuities and, for the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks, obtaining competitive results on par with task-specific discontinuous constituent parsers and achieving state-of-the-art scores on the (discontinuous) English Penn Treebank.

[104]  arXiv:2110.10433 [pdf]
Title: New Result on Interception of Stationary Targets at Arbitrary Time-Varying Velocity
Subjects: Systems and Control (eess.SY)

In this paper, some new results on time-varying missile against a stationary target using pure proportional navigation (PPN) are developed in the planar interception problem. First, the relative motion equation is established in arc-length domain based on the differential geometry theory, which eliminates the influence of time-varying missile speed. Then, the closed-form solution of time-varying speed missile intercepting stationary target with PPN is deduced, and the interception performance is analyzed. Additionally, considering the missile maneuvering acceleration limit, the capture region of time-varying speed missile is analyzed. Finally, the results derived in this paper are verified by numerical simulation analysis for various scenarios.

[105]  arXiv:2110.10436 [pdf, other]
Title: A Survey on Deep-Learning Approaches for Vehicle Trajectory Prediction in Autonomous Driving
Comments: Accepted by ROBIO2021
Subjects: Robotics (cs.RO)

With the rapid development of machine learning, autonomous driving has become a hot issue, making urgent demands for more intelligent perception and planning systems. Self-driving cars can avoid traffic crashes with precisely predicted future trajectories of surrounding vehicles. In this work, we review and categorize existing learning-based trajectory forecasting methods from perspectives of representation, modeling, and learning. Moreover, we make our implementation of Target-driveN Trajectory Prediction publicly available at https://github.com/Henry1iu/TNT-Trajectory-Predition, demonstrating its outstanding performance whereas its original codes are withheld. Enlightenment is expected for researchers seeking to improve trajectory prediction performance based on the achievement we have made.

[106]  arXiv:2110.10437 [pdf, other]
Title: A unifying framework for $n$-dimensional quasi-conformal mappings
Subjects: Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Differential Geometry (math.DG); Numerical Analysis (math.NA)

With the advancement of computer technology, there is a surge of interest in effective mapping methods for objects in higher-dimensional spaces. To establish a one-to-one correspondence between objects, higher-dimensional quasi-conformal theory can be utilized for ensuring the bijectivity of the mappings. In addition, it is often desirable for the mappings to satisfy certain prescribed geometric constraints and possess low distortion in conformality or volume. In this work, we develop a unifying framework for computing $n$-dimensional quasi-conformal mappings. More specifically, we propose a variational model that integrates quasi-conformal distortion, volumetric distortion, landmark correspondence, intensity mismatch and volume prior information to handle a large variety of deformation problems. We further prove the existence of a minimizer for the proposed model and devise efficient numerical methods to solve the optimization problem. We demonstrate the effectiveness of the proposed framework using various experiments in two- and three-dimensions, with applications to medical image registration, adaptive remeshing and shape modeling.

[107]  arXiv:2110.10444 [pdf, other]
Title: Moiré Attack (MA): A New Potential Risk of Screen Photos
Comments: NeurIPS 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Images, captured by a camera, play a critical role in training Deep Neural Networks (DNNs). Usually, we assume the images acquired by cameras are consistent with the ones perceived by human eyes. However, due to the different physical mechanisms between human-vision and computer-vision systems, the final perceived images could be very different in some cases, for example shooting on digital monitors. In this paper, we find a special phenomenon in digital image processing, the moir\'e effect, that could cause unnoticed security threats to DNNs. Based on it, we propose a Moir\'e Attack (MA) that generates the physical-world moir\'e pattern adding to the images by mimicking the shooting process of digital devices. Extensive experiments demonstrate that our proposed digital Moir\'e Attack (MA) is a perfect camouflage for attackers to tamper with DNNs with a high success rate ($100.0\%$ for untargeted and $97.0\%$ for targeted attack with the noise budget $\epsilon=4$), high transferability rate across different models, and high robustness under various defenses. Furthermore, MA owns great stealthiness because the moir\'e effect is unavoidable due to the camera's inner physical structure, which therefore hardly attracts the awareness of humans. Our code is available at https://github.com/Dantong88/Moire_Attack.

[108]  arXiv:2110.10446 [pdf, other]
Title: Interactive simulation for easy decision-making in fluid dynamics
Journal-ref: H. Theisel and M. Wimmer, editors, Eurographics 2021 - Short Papers. The Eurographics Association, 2021
Subjects: Computational Engineering, Finance, and Science (cs.CE)

A conventional study of fluid simulation involves different stages including conception, simulation, visualization, and analysis tasks. It is, therefore, necessary to switch between different software and interactive contexts which implies costly data manipulation and increases the time needed for decision making. Our interactive simulation approach was designed to shorten this loop, allowing users to visualize and steer a simulation in progress without waiting for the end of the simulation. The methodology allows the users to control, start, pause, or stop a simulation in progress, to change global physical parameters, to interact with its 3D environment by editing boundary conditions such as walls or obstacles. This approach is made possible by using a methodology such as the Lattice Boltzmann Method (LBM) to achieve interactive time while remaining physically relevant. In this work, we present our platform dedicated to interactive fluid simulation based on LBM. The contribution of our interactive simulation approach to decision making will be evaluated in a study based on a simple but realistic use case.

[109]  arXiv:2110.10447 [pdf, other]
Title: Easy and structured approach for software and firmware co-simulation for bus centric designs
Subjects: Other Computer Science (cs.OH)

Although software and firmware co-simulation is gaining popularity, it is still not widely used in the FPGA designs. This work presents easy and structured approach for software and firmware co-simulation for bus centric designs. The proposed approach is very modular and software language agnostic. The only requirement is that the firmware design is accessible via some kind of system bus. The concept has been used for testing DAQ system being developed for high energy physics experiment.

[110]  arXiv:2110.10450 [pdf, other]
Title: KabOOM: Unsupervised Crash Categorization through Timeseries Fingerprinting
Comments: Submitted to ICSE-SEIP 2022
Subjects: Software Engineering (cs.SE)

Modern mobile applications include instrumentation that sample internal application metrics at regular intervals. Following a crash, sample metrics are collected and can potentially be valuable for root-causing difficult to diagnose crashes. However, the fine-grained nature and overwhelming wealth of available application metrics, coupled with frequent application updates, renders their use for root-causing crashes extremely difficult.
We propose KabOOM, a method to automatically cluster telemetry reports in intuitive, distinct crash categories. Uniquely, KabOOM relies on multivariate timeseries fingerprinting; an auto-encoder coupled with a cluster centroid optimization technique learns embeddings of each crash report, which are then used to cluster metric timeseries based crash reports. We demonstrate the effectiveness of KabOOM on both reducing the dimensionality of the incoming crash reports and producing crash categories that are intuitive to developers.

[111]  arXiv:2110.10452 [pdf]
Title: Different Applications and Technologies of Internet of Things (IoT)
Comments: Paper is submitted to ICICT 2022 Conference for its acceptance
Subjects: Computers and Society (cs.CY)

Internet of things (IoT) has significantly altered the traditional lifestyle to a highly technologically advanced society. Some of the significant transformations that have been achieved through IoT are smart homes, smart transportation, smart city, and control of pollution. A considerable number of studies have been conducted and continue to be done to increase the use of technology through IoT. Furthermore, the research about IoT has not been done fully in improving the application of technology through IoT. Besides, IoT experiences several problems that need to be considered in order to get the full capability of IoT in changing society. This research paper addresses the key applications of IoT, the architecture of IoT, and the key issues affecting IoT. In addition, the paper highlights how big data analytics is essential in improving the effectiveness of IoT in various applications within society.

[112]  arXiv:2110.10456 [pdf, other]
Title: Noisy Annotation Refinement for Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Supervised training of object detectors requires well-annotated large-scale datasets, whose production is costly. Therefore, some efforts have been made to obtain annotations in economical ways, such as cloud sourcing. However, datasets obtained by these methods tend to contain noisy annotations such as inaccurate bounding boxes and incorrect class labels. In this study, we propose a new problem setting of training object detectors on datasets with entangled noises of annotations of class labels and bounding boxes. Our proposed method efficiently decouples the entangled noises, corrects the noisy annotations, and subsequently trains the detector using the corrected annotations. We verified the effectiveness of our proposed method and compared it with the baseline on noisy datasets with different noise levels. The experimental results show that our proposed method significantly outperforms the baseline.

[113]  arXiv:2110.10457 [pdf, other]
Title: Knowledge Graph informed Fake News Classification via Heterogeneous Representation Ensembles
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Increasing amounts of freely available data both in textual and relational form offers exploration of richer document representations, potentially improving the model performance and robustness. An emerging problem in the modern era is fake news detection -- many easily available pieces of information are not necessarily factually correct, and can lead to wrong conclusions or are used for manipulation. In this work we explore how different document representations, ranging from simple symbolic bag-of-words, to contextual, neural language model-based ones can be used for efficient fake news identification. One of the key contributions is a set of novel document representation learning methods based solely on knowledge graphs, i.e. extensive collections of (grounded) subject-predicate-object triplets. We demonstrate that knowledge graph-based representations already achieve competitive performance to conventionally accepted representation learners. Furthermore, when combined with existing, contextual representations, knowledge graph-based document representations can achieve state-of-the-art performance. To our knowledge this is the first larger-scale evaluation of how knowledge graph-based representations can be systematically incorporated into the process of fake news classification.

[114]  arXiv:2110.10460 [pdf, other]
Title: Zeros of quasi-paraorthogonal polynomials and positive quadrature
Subjects: Numerical Analysis (math.NA)

In this paper we illustrate that paraorthogonality on the unit circle $\mathbb{T}$ is the counterpart to orthogonality on $\mathbb{R}$ when we are interested in the spectral properties. We characterize quasi-paraorthogonal polynomials on the unit circle as the analogues of the quasi-orthogonal polynomials on $\mathbb{R}$. We analyze the possibilities of preselecting some of its zeros, in order to build positive quadrature formulas with prefixed nodes and maximal domain of validity. These quadrature formulas on the unit circle are illustrated numerically.

[115]  arXiv:2110.10461 [pdf, other]
Title: Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation
Comments: 34 pages, 18 figures, 13 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.

[116]  arXiv:2110.10470 [pdf, other]
Title: Interpreting Deep Learning Models in Natural Language Processing: A Review
Subjects: Computation and Language (cs.CL)

Neural network models have achieved state-of-the-art performances in a wide range of natural language processing (NLP) tasks. However, a long-standing criticism against neural network models is the lack of interpretability, which not only reduces the reliability of neural NLP systems but also limits the scope of their applications in areas where interpretability is essential (e.g., health care applications). In response, the increasing interest in interpreting neural NLP models has spurred a diverse array of interpretation methods over recent years. In this survey, we provide a comprehensive review of various interpretation methods for neural models in NLP. We first stretch out a high-level taxonomy for interpretation methods in NLP, i.e., training-based approaches, test-based approaches, and hybrid approaches. Next, we describe sub-categories in each category in detail, e.g., influence-function based methods, KNN-based methods, attention-based models, saliency-based methods, perturbation-based methods, etc. We point out deficiencies of current methods and suggest some avenues for future research.

[117]  arXiv:2110.10472 [pdf, other]
Title: Multilingual Unsupervised Neural Machine Translation with Denoising Adapters
Comments: Accepted as a long paper to EMNLP 2021
Subjects: Computation and Language (cs.CL)

We consider the problem of multilingual unsupervised machine translation, translating to and from languages that only have monolingual data by using auxiliary parallel language pairs. For this problem the standard procedure so far to leverage the monolingual data is back-translation, which is computationally costly and hard to tune.
In this paper we propose instead to use denoising adapters, adapter layers with a denoising objective, on top of pre-trained mBART-50. In addition to the modularity and flexibility of such an approach we show that the resulting translations are on-par with back-translating as measured by BLEU, and furthermore it allows adding unseen languages incrementally.

[118]  arXiv:2110.10474 [pdf, other]
Title: R4: A Framework for Route Representation and Route Recommendation
Subjects: Artificial Intelligence (cs.AI)

Route recommendation is significant in navigation service. Two major challenges for route recommendation are route representation and user representation. Different from items that can be identified by unique IDs in traditional recommendation, routes are combinations of links (i.e., a road segment and its following action like turning left) and the number of combinations could be close to infinite. Besides, the representation of a route changes under different scenarios. These facts result in severe sparsity of routes, which increases the difficulty of route representation. Moreover, link attribute deficiencies and errors affect preciseness of route representation. Because of the sparsity of routes, the interaction data between users and routes are also sparse. This makes it not easy to acquire user representation from historical user-item interactions as traditional recommendations do. To address these issues, we propose a novel learning framework R4. In R4, we design a sparse & dense network to obtain representations of routes. The sparse unit learns link ID embeddings and aggregates them to represent a route, which captures implicit route characteristics and subsequently alleviates problems caused by link attribute deficiencies and errors. The dense unit extracts implicit local features of routes from link attributes. For user representation, we utilize a series of historical navigation to extract user preference. R4 achieves remarkable performance in both offline and online experiments.

[119]  arXiv:2110.10478 [pdf, other]
Title: Continual Learning in Multilingual NMT via Language-Specific Embeddings
Authors: Alexandre Berard
Comments: Accepted as a research paper to WMT 2021
Subjects: Computation and Language (cs.CL)

This paper proposes a technique for adding a new source or target language to an existing multilingual NMT model without re-training it on the initial set of languages. It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data. Some additional language-specific components may be trained to improve performance (e.g., Transformer layers or adapter modules). Because the parameters of the original model are not modified, its performance on the initial languages does not degrade. We show on two sets of experiments (small-scale on TED Talks, and large-scale on ParaCrawl) that this approach performs as well or better as the more costly alternatives; and that it has excellent zero-shot performance: training on English-centric data is enough to translate between the new language and any of the initial languages.

[120]  arXiv:2110.10481 [pdf, other]
Title: Unified Style Transfer
Comments: 9 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Currently, it is hard to compare and evaluate different style transfer algorithms due to chaotic definitions of style and the absence of agreed objective validation methods in the study of style transfer. In this paper, a novel approach, the Unified Style Transfer (UST) model, is proposed. With the introduction of a generative model for internal style representation, UST can transfer images in two approaches, i.e., Domain-based and Image-based, simultaneously. At the same time, a new philosophy based on the human sense of art and style distributions for evaluating the transfer model is presented and demonstrated, called Statistical Style Analysis. It provides a new path to validate style transfer models' feasibility by validating the general consistency between internal style representation and art facts. Besides, the translation-invariance of AdaIN features is also discussed.

[121]  arXiv:2110.10482 [pdf, other]
Title: Surrogate Representation Learning with Isometric Mapping for Gray-box Graph Adversarial Attacks
Subjects: Artificial Intelligence (cs.AI)

Gray-box graph attacks aim at disrupting the performance of the victim model by using inconspicuous attacks with limited knowledge of the victim model. The parameters of the victim model and the labels of the test nodes are invisible to the attacker. To obtain the gradient on the node attributes or graph structure, the attacker constructs an imaginary surrogate model trained under supervision. However, there is a lack of discussion on the training of surrogate models and the robustness of provided gradient information. The general node classification model loses the topology of the nodes on the graph, which is, in fact, an exploitable prior for the attacker. This paper investigates the effect of representation learning of surrogate models on the transferability of gray-box graph adversarial attacks. To reserve the topology in the surrogate embedding, we propose Surrogate Representation Learning with Isometric Mapping (SRLIM). By using Isometric mapping method, our proposed SRLIM can constrain the topological structure of nodes from the input layer to the embedding space, that is, to maintain the similarity of nodes in the propagation process. Experiments prove the effectiveness of our approach through the improvement in the performance of the adversarial attacks generated by the gradient-based attacker in untargeted poisoning gray-box setups.

[122]  arXiv:2110.10486 [pdf, other]
Title: A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays
Comments: 14 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In the last few years, research and development on Deep Learning models and techniques for ultra-low-power devices in a word, TinyML has mainly focused on a train-then-deploy assumption, with static models that cannot be adapted to newly collected data without cloud-based data collection and fine-tuning. Latent Replay-based Continual Learning (CL) techniques[1] enable online, serverless adaptation in principle, but so farthey have still been too computation and memory-hungry for ultra-low-power TinyML devices, which are typically based on microcontrollers. In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power (PULP) processor. We rethink the baseline Latent Replay CL algorithm, leveraging quantization of the frozen stage of the model and Latent Replays (LRs) to reduce their memory cost with minimal impact on accuracy. In particular, 8-bit compression of the LR memory proves to be almost lossless (-0.26% with 3000LR) compared to the full-precision baseline implementation, but requires 4x less memory, while 7-bit can also be used with an additional minimal accuracy degradation (up to 5%). We also introduce optimized primitives for forward and backward propagation on the PULP processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory an amount compatible with embedding in TinyML devices. On an advanced 22nm prototype of our platform, called VEGA, the proposed solution performs onaverage 65x faster than a low-power STM32 L4 microcontroller, being 37x more energy efficient enough for a lifetime of 535h when learning a new mini-batch of data once every minute.

[123]  arXiv:2110.10490 [pdf, ps, other]
Title: Transferring Reinforcement Learning for DC-DC Buck Converter Control via Duty Ratio Mapping: From Simulation to Implementation
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Reinforcement learning (RL) control approach with application into power electronics systems has become an emerging topic whilst the sim-to-real issue remains a challenging problem as very few results can be referred to in the literature. Indeed, due to the inevitable mismatch between simulation models and real-life systems, offline trained RL control strategies may sustain unexpected hurdles in practical implementation during transferring procedure. As the main contribution of this paper, a transferring methodology via a delicately designed duty ratio mapping (DRM) is proposed for a DC-DC buck converter. Then, a detailed sim-to-real process is presented to enable the implementation of a model-free deep reinforcement learning (DRL) controller. The feasibility and effectiveness of the proposed methodology are demonstrated by comparative experimental studies.

[124]  arXiv:2110.10491 [pdf, ps, other]
Title: A Study On Data Augmentation In Voice Anti-Spoofing
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)

In this paper, we perform an in-depth study of how data augmentation techniques improve synthetic or spoofed audio detection. Specifically, we propose methods to deal with channel variability, different audio compressions, different band-widths, and unseen spoofing attacks, which have all been shown to significantly degrade the performance of audio-based systems and Anti-Spoofing systems. Our results are based on the ASVspoof 2021 challenge, in the Logical Access (LA) and Deep Fake (DF) categories. Our study is Data-Centric, meaning that the models are fixed and we significantly improve the results by making changes in the data. We introduce two forms of data augmentation - compression augmentation for the DF part, compression & channel augmentation for the LA part. In addition, a new type of online data augmentation, SpecAverage, is introduced in which the audio features are masked with their average value in order to improve generalization. Furthermore, we introduce a Log spectrogram feature design that improved the results. Our best single system and fusion scheme both achieve state-of-the-art performance in the DF category, with an EER of 15.46% and 14.46% respectively. Our best system for the LA task reduced the best baseline EER by 50% and the min t-DCF by 16%. Our techniques to deal with spoofed data from a wide variety of distributions can be replicated and can help anti-spoofing and speech-based systems enhance their results.

[125]  arXiv:2110.10493 [pdf, ps, other]
Title: On the Effectiveness of Clone Detection for Detecting IoT-related Vulnerable Clones
Comments: 7 pages, 4 figures
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Since IoT systems provide services over the Internet, they must continue to operate safely even if malicious users attack them. Since the computational resources of edge devices connected to the IoT are limited, lightweight platforms and network protocols are often used. Lightweight platforms and network protocols are less resistant to attacks, increasing the risk that developers will embed vulnerabilities. The code clone research community has been developing approaches to fix buggy (e.g., vulnerable) clones simultaneously. However, there has been little research on IoT-related vulnerable clones. It is unclear whether existing code clone detection techniques can perform simultaneous fixes of the vulnerable clones. In this study, we first created two datasets of IoT-related vulnerable code. We then conducted a preliminary investigation to show whether existing code clone detection tools (e.g., NiCaD, CCFinderSW) are capable of detecting IoT-related vulnerable clones by applying them to the created datasets. The preliminary result shows that the existing tools can detect them partially.

[126]  arXiv:2110.10494 [pdf, other]
Title: Deep Point Cloud Normal Estimation via Triplet Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG); Graphics (cs.GR)

Normal estimation on 3D point clouds is a fundamental problem in 3D vision and graphics. Current methods often show limited accuracy in predicting normals at sharp features (e.g., edges and corners) and less robustness to noise. In this paper, we propose a novel normal estimation method for point clouds. It consists of two phases: (a) feature encoding which learns representations of local patches, and (b) normal estimation that takes the learned representation as input and regresses the normal vector. We are motivated that local patches on isotropic and anisotropic surfaces have similar or distinct normals, and that separable features or representations can be learned to facilitate normal estimation. To realise this, we first construct triplets of local patches on 3D point cloud data, and design a triplet network with a triplet loss for feature encoding. We then design a simple network with several MLPs and a loss function to regress the normal vector. Despite having a smaller network size compared to most other methods, experimental results show that our method preserves sharp features and achieves better normal estimation results on CAD-like shapes.

[127]  arXiv:2110.10501 [pdf, other]
Title: STALP: Style Transfer with Auxiliary Limited Pairing
Comments: Eurographics 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

We present an approach to example-based stylization of images that uses a single pair of a source image and its stylized counterpart. We demonstrate how to train an image translation network that can perform real-time semantically meaningful style transfer to a set of target images with similar content as the source image. A key added value of our approach is that it considers also consistency of target images during training. Although those have no stylized counterparts, we constrain the translation to keep the statistics of neural responses compatible with those extracted from the stylized source. In contrast to concurrent techniques that use a similar input, our approach better preserves important visual characteristics of the source style and can deliver temporally stable results without the need to explicitly handle temporal consistency. We demonstrate its practical utility on various applications including video stylization, style transfer to panoramas, faces, and 3D models.

[128]  arXiv:2110.10505 [pdf, other]
Title: Event Guided Depth Sensing
Journal-ref: International Conference on 3D Vision (3DV), Online, 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Active depth sensors like structured light, lidar, and time-of-flight systems sample the depth of the entire scene uniformly at a fixed scan rate. This leads to limited spatio-temporal resolution where redundant static information is over-sampled and precious motion information might be under-sampled. In this paper, we present an efficient bio-inspired event-camera-driven depth estimation algorithm. In our approach, we dynamically illuminate areas of interest densely, depending on the scene activity detected by the event camera, and sparsely illuminate areas in the field of view with no motion. The depth estimation is achieved by an event-based structured light system consisting of a laser point projector coupled with a second event-based sensor tuned to detect the reflection of the laser from the scene. We show the feasibility of our approach in a simulated autonomous driving scenario and real indoor sequences using our prototype. We show that, in natural scenes like autonomous driving and indoor environments, moving edges correspond to less than 10% of the scene on average. Thus our setup requires the sensor to scan only 10% of the scene, which could lead to almost 90% less power consumption by the illumination source. While we present the evaluation and proof-of-concept for an event-based structured-light system, the ideas presented here are applicable for a wide range of depth-sensing modalities like LIDAR, time-of-flight, and standard stereo.

[129]  arXiv:2110.10507 [pdf, other]
Title: Development and analysis of entropy stable no-slip wall boundary conditions for the Eulerian model for viscous and heat conducting compressible flows
Comments: The datasets generated during and/or analysed during the current study are available in the Zenodo repository, this http URL
Journal-ref: Partial Differ. Equ. Appl. 2, 77 (2021)
Subjects: Numerical Analysis (math.NA)

Nonlinear entropy stability analysis is used to derive entropy stable no-slip wall boundary conditions for the Eulerian model proposed by Sv\"{a}rd (Physica A: Statistical Mechanics and its Applications, 2018). and its spatial discretization based on entropy stable collocated discontinuous Galerkin operators with the summation-by-parts property for unstructured grids. A set of viscous test cases of increasing complexity are simulated using both the Eulerian and the classic compressible Navier-Stokes models. The numerical results obtained with the two models are compared, and differences and similarities are then highlighted.

[130]  arXiv:2110.10510 [pdf, other]
Title: Periodic DMP formulation for Quaternion Trajectories
Comments: 2021 20th International Conference on Advanced Robotics (ICAR)
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Imitation learning techniques have been used as a way to transfer skills to robots. Among them, dynamic movement primitives (DMPs) have been widely exploited as an effective and an efficient technique to learn and reproduce complex discrete and periodic skills. While DMPs have been properly formulated for learning point-to-point movements for both translation and orientation, periodic ones are missing a formulation to learn the orientation. To address this gap, we propose a novel DMP formulation that enables encoding of periodic orientation trajectories. Within this formulation we develop two approaches: Riemannian metric-based projection approach and unit quaternion based periodic DMP. Both formulations exploit unit quaternions to represent the orientation. However, the first exploits the properties of Riemannian manifolds to work in the tangent space of the unit sphere. The second encodes directly the unit quaternion trajectory while guaranteeing the unitary norm of the generated quaternions. We validated the technical aspects of the proposed methods in simulation. Then we performed experiments on a real robot to execute daily tasks that involve periodic orientation changes (i.e., surface polishing/wiping and liquid mixing by shaking).

[131]  arXiv:2110.10517 [pdf, ps, other]
Title: A $C^{0}$ interior penalty method for $m$th-Laplace equation
Subjects: Numerical Analysis (math.NA)

In this paper, we propose a $C^{0}$ interior penalty method for $m$th-Laplace equation on bounded Lipschitz polyhedral domain in $\mathbb{R}^{d}$, where $m$ and $d$ can be any positive integers. The standard $H^{1}$-conforming piecewise $r$-th order polynomial space is used to approximate the exact solution $u$, where $r$ can be any integer greater than or equal to $m$. Unlike the interior penalty method in [T.~Gudi and M.~Neilan, {\em An interior penalty method for a sixth-order elliptic equation}, IMA J. Numer. Anal., \textbf{31(4)} (2011), pp. 1734--1753], we avoid computing $D^{m}$ of numerical solution on each element and high order normal derivatives of numerical solution along mesh interfaces. Therefore our method can be easily implemented. After proving discrete $H^{m}$-norm bounded by the natural energy semi-norm associated with our method, we manage to obtain stability and optimal convergence with respect to discrete $H^{m}$-norm. Numerical experiments validate our theoretical estimate.

[132]  arXiv:2110.10522 [pdf, other]
Title: CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

As an algorithm based on deep reinforcement learning, Proximal Policy Optimization (PPO) performs well in many complex tasks and has become one of the most popular RL algorithms in recent years. According to the mechanism of penalty in surrogate objective, PPO can be divided into PPO with KL Divergence (KL-PPO) and PPO with Clip function(Clip-PPO). Clip-PPO is widely used in a variety of practical scenarios and has attracted the attention of many researchers. Therefore, many variations have also been created, making the algorithm better and better. However, as a more theoretical algorithm, KL-PPO was neglected because its performance was not as good as CliP-PPO. In this article, we analyze the asymmetry effect of KL divergence on PPO's objective function , and give the inequality that can indicate when the asymmetry will affect the efficiency of KL-PPO. Proposed PPO with Correntropy Induced Metric algorithm(CIM-PPO) that use the theory of correntropy(a symmetry metric method that was widely used in M-estimation to evaluate two distributions' difference)and applied it in PPO. Then, we designed experiments based on OpenAIgym to test the effectiveness of the new algorithm and compare it with KL-PPO and CliP-PPO.

[133]  arXiv:2110.10523 [pdf, other]
Title: Detecting and Identifying Optical Signal Attacks on Autonomous Driving Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

For autonomous driving, an essential task is to detect surrounding objects accurately. To this end, most existing systems use optical devices, including cameras and light detection and ranging (LiDAR) sensors, to collect environment data in real time. In recent years, many researchers have developed advanced machine learning models to detect surrounding objects. Nevertheless, the aforementioned optical devices are vulnerable to optical signal attacks, which could compromise the accuracy of object detection. To address this critical issue, we propose a framework to detect and identify sensors that are under attack. Specifically, we first develop a new technique to detect attacks on a system that consists of three sensors. Our main idea is to: 1) use data from three sensors to obtain two versions of depth maps (i.e., disparity) and 2) detect attacks by analyzing the distribution of disparity errors. In our study, we use real data sets and the state-of-the-art machine learning model to evaluate our attack detection scheme and the results confirm the effectiveness of our detection method. Based on the detection scheme, we further develop an identification model that is capable of identifying up to n-2 attacked sensors in a system with one LiDAR and n cameras. We prove the correctness of our identification scheme and conduct experiments to show the accuracy of our identification method. Finally, we investigate the overall sensitivity of our framework.

[134]  arXiv:2110.10524 [pdf, other]
Title: Statistical and Topological Properties of Gaussian Smoothed Sliced Probability Divergences
Authors: Alain Rakotomamonjy, Mokhtar Z. Alaya (LMAC), Maxime Berar (DocApp - LITIS), Gilles Gasso (DocApp - LITIS)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Gaussian smoothed sliced Wasserstein distance has been recently introduced for comparing probability distributions, while preserving privacy on the data. It has been shown, in applications such as domain adaptation, to provide performances similar to its non-private (non-smoothed) counterpart. However, the computational and statistical properties of such a metric is not yet been well-established. In this paper, we analyze the theoretical properties of this distance as well as those of generalized versions denoted as Gaussian smoothed sliced divergences. We show that smoothing and slicing preserve the metric property and the weak topology. We also provide results on the sample complexity of such divergences. Since, the privacy level depends on the amount of Gaussian smoothing, we analyze the impact of this parameter on the divergence. We support our theoretical findings with empirical studies of Gaussian smoothed and sliced version of Wassertein distance, Sinkhorn divergence and maximum mean discrepancy (MMD). In the context of privacy-preserving domain adaptation, we confirm that those Gaussian smoothed sliced Wasserstein and MMD divergences perform very well while ensuring data privacy.

[135]  arXiv:2110.10527 [pdf, other]
Title: Sampling from Arbitrary Functions via PSD Models
Authors: Ulysse Marteau-Ferey (SIERRA, PSL), Alessandro Rudi (PSL, SIERRA), Francis Bach (PSL, SIERRA)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)

In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i.i.d.) samples from a given distribution is a key task. When the distribution is known only through evaluations of the density, current methods either scale badly with the dimension or require very involved implementations. Instead, we take a two-step approach by first modeling the probability distribution and then sampling from that model. We use the recently introduced class of positive semi-definite (PSD) models, which have been shown to be efficient for approximating probability densities. We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models. We also present preliminary empirical results to illustrate our assertions.

[136]  arXiv:2110.10533 [pdf, other]
Title: AniFormer: Data-driven 3D Animation with Transformer
Comments: BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a novel task, i.e., animating a target 3D object through the motion of a raw driving sequence. In previous works, extra auxiliary correlations between source and target meshes or intermedia factors are inevitable to capture the motions in the driving sequences. Instead, we introduce AniFormer, a novel Transformer-based architecture, that generates animated 3D sequences by directly taking the raw driving sequences and arbitrary same-type target meshes as inputs. Specifically, we customize the Transformer architecture for 3D animation that generates mesh sequences by integrating styles from target meshes and motions from the driving meshes. Besides, instead of the conventional single regression head in the vanilla Transformer, AniFormer generates multiple frames as outputs to preserve the sequential consistency of the generated meshes. To achieve this, we carefully design a pair of regression constraints, i.e., motion and appearance constraints, that can provide strong regularization on the generated mesh sequences. Our AniFormer achieves high-fidelity, realistic, temporally coherent animated results and outperforms compared start-of-the-art methods on benchmarks of diverse categories. Code is available: https://github.com/mikecheninoulu/AniFormer.

[137]  arXiv:2110.10534 [pdf, other]
Title: FairNet: A Measurement Framework for Traffic Discrimination Detection on the Internet
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

Network neutrality is related to the non-discriminatory treatment of packets on the Internet. Any deliberate discrimination of traffic of one application while favoring others violates the principle of neutrality. Many countries have enforced laws against such discrimination. To enforce such laws, one requires tools to detect any net neutrality violations. However, detecting such violations is challenging as it is hard to separate any degradation in quality due to natural network effects and selective degradation. Also, legitimate traffic management and deliberate discrimination methods can be technically the same, making it further challenging to distinguish them.
We developed an end-to-end measurement framework named FairNet to detect discrimination of traffic. It compares the performance of similar services. Our focus is on HTTPS streaming services which constitute a predominant portion of the Internet traffic. The effect of confounding factors (congestion, traffic management policy, dynamic rate adaptation) is made `similar' on the test services to ensure a fair comparison. FairNet framework uses a ``replay server'' and user-client that exchanges correctly identifiable traffic streams over the Internet. The Server Name Indication (SNI) field in the TLS handshake, which goes in plaintext, ensures that the traffic from the replay server appears to network middle-boxes as that coming from its actual server. We validated that appropriate SNIs results in the correct classification of services using a commercial traffic shaper. FairNet uses two novel algorithms based on application-level throughput and connection status to detect traffic discrimination. We also validated the methodology's effectiveness by collecting network logs through mobile apps over the live Internet and analyzing them.

[138]  arXiv:2110.10535 [pdf, ps, other]
Title: Investigating Reversibility of Steps in Petri Nets
Comments: special issue of PN 2019
Subjects: Formal Languages and Automata Theory (cs.FL); Software Engineering (cs.SE)

In reversible computations one is interested in the development of mechanisms allowing to undo the effects of executed actions. The past research has been concerned mainly with reversing single actions. In this paper, we consider the problem of reversing the effect of the execution of groups of actions (steps).
Using Petri nets as a system model, we introduce concepts related to this new scenario, generalising notions used in the single action case. We then present properties arising when reverse actions are allowed in place/transition nets (pt-nets). We obtain both positive and negative results, showing that allowing steps makes reversibility more problematic than in the interleaving/sequential case. In particular, we demonstrate that there is a crucial difference between reversing steps which are sets and those which are true multisets. Moreover, in contrast to sequential semantics, splitting reverses does not lead to a general method for reversing bounded pt-nets. We then show that a suitable solution can be obtained by combining split reverses with weighted read arcs.

[139]  arXiv:2110.10536 [pdf, other]
Title: Improving Model Generalization by Agreement of Learned Representations from Data Augmentation
Authors: Rowel Atienza
Comments: Accepted at WACV2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Data augmentation reduces the generalization error by forcing a model to learn invariant representations given different transformations of the input image. In computer vision, on top of the standard image processing functions, data augmentation techniques based on regional dropout such as CutOut, MixUp, and CutMix and policy-based selection such as AutoAugment demonstrated state-of-the-art (SOTA) results. With an increasing number of data augmentation algorithms being proposed, the focus is always on optimizing the input-output mapping while not realizing that there might be an untapped value in the transformed images with the same label. We hypothesize that by forcing the representations of two transformations to agree, we can further reduce the model generalization error. We call our proposed method Agreement Maximization or simply AgMax. With this simple constraint applied during training, empirical results show that data augmentation algorithms can further improve the classification accuracy of ResNet50 on ImageNet by up to 1.5%, WideResNet40-2 on CIFAR10 by up to 0.7%, WideResNet40-2 on CIFAR100 by up to 1.6%, and LeNet5 on Speech Commands Dataset by up to 1.4%. Experimental results further show that unlike other regularization terms such as label smoothing, AgMax can take advantage of the data augmentation to consistently improve model generalization by a significant margin. On downstream tasks such as object detection and segmentation on PascalVOC and COCO, AgMax pre-trained models outperforms other data augmentation methods by as much as 1.0mAP (box) and 0.5mAP (mask). Code is available at https://github.com/roatienza/agmax.

[140]  arXiv:2110.10538 [pdf, other]
Title: Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning
Comments: NeurIPS'21 Spotlight paper. code available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Access to 3D point cloud representations has been widely facilitated by LiDAR sensors embedded in various mobile devices. This has led to an emerging need for fast and accurate point cloud processing techniques. In this paper, we revisit and dive deeper into PointNet++, one of the most influential yet under-explored networks, and develop faster and more accurate variants of the model. We first present a novel Separable Set Abstraction (SA) module that disentangles the vanilla SA module used in PointNet++ into two separate learning stages: (1) learning channel correlation and (2) learning spatial correlation. The Separable SA module is significantly faster than the vanilla version, yet it achieves comparable performance. We then introduce a new Anisotropic Reduction function into our Separable SA module and propose an Anisotropic Separable SA (ASSA) module that substantially increases the network's accuracy. We later replace the vanilla SA modules in PointNet++ with the proposed ASSA module, and denote the modified network as ASSANet. Extensive experiments on point cloud classification, semantic segmentation, and part segmentation show that ASSANet outperforms PointNet++ and other methods, achieving much higher accuracy and faster speeds. In particular, ASSANet outperforms PointNet++ by $7.4$ mIoU on S3DIS Area 5, while maintaining $1.6 \times $ faster inference speed on a single NVIDIA 2080Ti GPU. Our scaled ASSANet variant achieves $66.8$ mIoU and outperforms KPConv, while being more than $54 \times$ faster.

[141]  arXiv:2110.10540 [pdf, other]
Title: On the Integration of Course of Action Playbooks into Shareable Cyber Threat Intelligence
Subjects: Cryptography and Security (cs.CR)

Motivated by the introduction of CACAO, the first open standard that harmonizes the way we document course of action playbooks in a machine-readable format for interoperability, and the benefits for cybersecurity operations derived from utilizing, and coupling and sharing security playbooks as part of cyber threat intelligence, we introduce a uniform metadata template that supports the management and integration of security playbooks into knowledge representation and knowledge management systems. To demonstrate the applicability of our approach, we provide two use-case implementations where our uniform non-proprietary metadata template is used to introduce security playbooks like CACAO into the MISP threat intelligence platform and the Threat Actor Context ontology.

[142]  arXiv:2110.10545 [pdf, other]
Title: Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs
Comments: 45 pages
Subjects: Machine Learning (cs.LG)

Pre-trained model hubs with many pre-trained models (PTMs) have been a cornerstone in deep learning. Although built at a high cost, they are in fact \emph{under-exploited}: practitioners usually pick one PTM from the provided model hub by popularity, and then fine-tune the PTM to solve the target task. This na\"ve but common practice poses two obstacles to sufficiently exploiting pre-trained model hubs: (1) the PTM selection procedure has no optimality guarantee; (2) only one PTM is used while the rest PTMs are overlooked. Ideally, to maximally exploit pre-trained model hubs, trying all combinations of PTMs and extensively fine-tuning each combination of PTMs are required, which incurs exponential combinations and unaffordable computational budget. In this paper, we propose a new paradigm of exploiting model hubs by ranking and tuning pre-trained models: (1) Our conference work~\citep{you_logme:_2021} proposed LogME to estimate the maximum value of label evidence given features extracted by pre-trained models, which can rank all the PTMs in a model hub for various types of PTMs and tasks \emph{before fine-tuning}. (2) the best ranked PTM can be fine-tuned and deployed if we have no preference for the model's architecture, or the target PTM can be tuned by top-K ranked PTMs via the proposed B-Tuning algorithm. The ranking part is based on the conference paper, and we complete its theoretical analysis (convergence proof of the heuristic evidence maximization procedure, and the influence of feature dimension) in this paper. The tuning part introduces a novel Bayesian Tuning (B-Tuning) method for multiple PTMs tuning, which surpasses dedicated methods designed for homogeneous PTMs tuning and sets up new state of the art for heterogeneous PTMs tuning. We believe the new paradigm of exploiting PTM hubs can interest a large audience of the community.

[143]  arXiv:2110.10546 [pdf, other]
Title: Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation
Comments: Accepted to NeurIPS2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Single image reflection separation (SIRS), as a representative blind source separation task, aims to recover two layers, $\textit{i.e.}$, transmission and reflection, from one mixed observation, which is challenging due to the highly ill-posed nature. Existing deep learning based solutions typically restore the target layers individually, or with some concerns at the end of the output, barely taking into account the interaction across the two streams/branches. In order to utilize information more efficiently, this work presents a general yet simple interactive strategy, namely $\textit{your trash is my treasure}$ (YTMT), for constructing dual-stream decomposition networks. To be specific, we explicitly enforce the two streams to communicate with each other block-wisely. Inspired by the additive property between the two components, the interactive path can be easily built via transferring, instead of discarding, deactivated information by the ReLU rectifier from one stream to the other. Both ablation studies and experimental results on widely-used SIRS datasets are conducted to demonstrate the efficacy of YTMT, and reveal its superiority over other state-of-the-art alternatives. The implementation is quite simple and our code is publicly available at $\href{https://github.com/mingcv/YTMT-Strategy}{\textit{https://github.com/mingcv/YTMT-Strategy}}$.

[144]  arXiv:2110.10548 [pdf, other]
Title: Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning
Comments: Submitted to the 5th MLSys Conference
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

We present a novel characterization of the mapping of multiple parallelism forms (e.g. data and model parallelism) onto hierarchical accelerator systems that is hierarchy-aware and greatly reduces the space of software-to-hardware mapping. We experimentally verify the substantial effect of these mappings on all-reduce performance (up to 448x). We offer a novel syntax-guided program synthesis framework that is able to decompose reductions over one or more parallelism axes to sequences of collectives in a hierarchy- and mapping-aware way. For 69% of parallelism placements and user requested reductions, our framework synthesizes programs that outperform the default all-reduce implementation when evaluated on different GPU hierarchies (max 2.04x, average 1.27x). We complement our synthesis tool with a simulator exceeding 90% top-10 accuracy, which therefore reduces the need for massive evaluations of synthesis results to determine a small set of optimal programs and mappings.

[145]  arXiv:2110.10549 [pdf, ps, other]
Title: Statistical Physics Meets Wireless Communications: A Resource Allocation Solution for Large Networks
Subjects: Information Theory (cs.IT)

The ever-increasing number of nodes in current and future wireless communication networks brings unprecedented challenges for the allocation of the available communication resources. This is caused by the combinatorial nature of the resource allocation problems, which limits the performance of state-of-the-art techniques when the network size increases. In this paper, we take a new direction and investigate how methods from statistical physics can be used to address resource allocation problems in large networks. To this aim, we propose a novel model of the wireless network based on a type of disordered physical systems called spin glasses, and on the contributions of the recently Nobel laureate G. Parisi. We show that resource allocation problems, e.g., time, code or frequency assignment, have the same structure as the problem of finding specific configurations in spin glasses. Based on this parallel, we investigate the use of the Survey Propagation method from statistical physics in the solution of resource allocation problems in wireless networks. Through numerical simulations we show that the proposed statistical-physics-based resource allocation algorithm is a promising tool for the efficient allocation of communication resources in large wireless communications networks. Given a fixed number of resources, we are able to serve a larger number of nodes, compared to state-of-the-art reference schemes, without introducing more interference into the system

[146]  arXiv:2110.10551 [pdf]
Title: Hosting Capacity Approach Implications
Subjects: Systems and Control (eess.SY)

This paper revisits the generation hosting capacity (HC) calculation approach to account for grid operational flexibility--the ability to reconfigure the system safely. In essence, the generation hosting capacity is determined against the set of limiting factors--voltage, thermal (conductor loading), reverse flow (at the feeder head, station transformer, or substation), and change in the voltage (due to sudden change in generation output)). Not that long ago, California Investor-Owned Utilities (IOUs) added a new criterion that does not allow reverse flow at the supervisory control and data acquisition (SCADA) points that can change the system configuration, aiming to prevent the potential transfer of reverse flow to an adjacent feeder. This new criterion intended to capture operational constraints as part of hosting capacity-known as hosting capacity with operational flexibility (OpFlex). This paper explores the shortfalls of such an approach and proposes performing actual transfer analysis when determining hosting capacity rather than implementing the OpFlex approach. Furthermore, we discuss the need for transition to determining hosting capacity profile (all intervals) rather than a flat line (one, worst performing interval) hosting capacity. A hosting capacity profile would inform the developers of interval-by-interval limits and opportunities, creating new opportunities to reach higher penetration of DERs at a lower cost. With technological and computational advancements, such an approach is neither out of implementation reach nor that computationally expensive. In return, far more DER can be interconnected once programmed not to violate certain generation profiles as part of the interconnection requirement, and utilities would be better informed of their actual operational flexibility, benefiting society overall.

[147]  arXiv:2110.10552 [pdf, other]
Title: Few-Shot Temporal Action Localization with Query Adaptive Transformer
Comments: BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)

Existing temporal action localization (TAL) works rely on a large number of training videos with exhaustive segment-level annotation, preventing them from scaling to new classes. As a solution to this problem, few-shot TAL (FS-TAL) aims to adapt a model to a new class represented by as few as a single video. Exiting FS-TAL methods assume trimmed training videos for new classes. However, this setting is not only unnatural actions are typically captured in untrimmed videos, but also ignores background video segments containing vital contextual cues for foreground action segmentation. In this work, we first propose a new FS-TAL setting by proposing to use untrimmed training videos. Further, a novel FS-TAL model is proposed which maximizes the knowledge transfer from training classes whilst enabling the model to be dynamically adapted to both the new class and each video of that class simultaneously. This is achieved by introducing a query adaptive Transformer in the model. Extensive experiments on two action localization benchmarks demonstrate that our method can outperform all the state of the art alternatives significantly in both single-domain and cross-domain scenarios. The source code can be found in https://github.com/sauradip/fewshotQAT

[148]  arXiv:2110.10554 [pdf, other]
Title: Receding Horizon Control in Deep Structured Teams: A Provably Tractable Large-Scale Approach with Application to Swarm Robotics
Subjects: Systems and Control (eess.SY)

In this paper, a deep structured tracking problem is introduced for a large number of decision-makers. The problem is formulated as a linear quadratic deep structured team, where the decision-makers wish to track a global target cooperatively while considering their local targets. For the unconstrained setup, the gauge transformation technique is used to decompose the resultant optimization problem in order to obtain a low-dimensional optimal control strategy in terms of the local and global Riccati equations. For the constrained case, however, the feasible set is not necessarily decomposable by the gauge transformation. To overcome this hurdle, we propose a family of local and global receding horizon control problems, where a carefully constructed linear combination of their solutions provides a feasible solution for the original constrained problem. The salient property of the above solutions is that they are tractable with respect to the number of decision-makers and can be implemented in a distributed manner. In addition, the main results are generalized to cases with multiple sub-populations and multiple features, including leader-follower setup, cohesive cost function and soft structural constraint. Furthermore, a class of cyber-physical attacks is proposed in terms of perturbed influence factors. A numerical example is presented to demonstrate the efficacy of the results.

[149]  arXiv:2110.10555 [pdf, other]
Title: Why Settle for Just One? Extending EL++ Ontology Embeddings with Many-to-Many Relationships
Comments: The paper got accepted in SemrRec challenge in ISWC 2021
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Knowledge Graph (KG) embeddings provide a low-dimensional representation of entities and relations of a Knowledge Graph and are used successfully for various applications such as question answering and search, reasoning, inference, and missing link prediction. However, most of the existing KG embeddings only consider the network structure of the graph and ignore the semantics and the characteristics of the underlying ontology that provides crucial information about relationships between entities in the KG. Recent efforts in this direction involve learning embeddings for a Description Logic (logical underpinning for ontologies) named EL++. However, such methods consider all the relations defined in the ontology to be one-to-one which severely limits their performance and applications. We provide a simple and effective solution to overcome this shortcoming that allows such methods to consider many-to-many relationships while learning embedding representations. Experiments conducted using three different EL++ ontologies show substantial performance improvement over five baselines. Our proposed solution also paves the way for learning embedding representations for even more expressive description logics such as SROIQ.

[150]  arXiv:2110.10563 [pdf, other]
Title: Robust Monocular Localization in Sparse HD Maps Leveraging Multi-Task Uncertainty Estimation
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Robust localization in dense urban scenarios using a low-cost sensor setup and sparse HD maps is highly relevant for the current advances in autonomous driving, but remains a challenging topic in research. We present a novel monocular localization approach based on a sliding-window pose graph that leverages predicted uncertainties for increased precision and robustness against challenging scenarios and per frame failures. To this end, we propose an efficient multi-task uncertainty-aware perception module, which covers semantic segmentation, as well as bounding box detection, to enable the localization of vehicles in sparse maps, containing only lane borders and traffic lights. Further, we design differentiable cost maps that are directly generated from the estimated uncertainties. This opens up the possibility to minimize the reprojection loss of amorphous map elements in an association free and uncertainty-aware manner. Extensive evaluation on the Lyft 5 dataset shows that, despite the sparsity of the map, our approach enables robust and accurate 6D localization in challenging urban scenarios

[151]  arXiv:2110.10566 [pdf]
Title: Exploring the Relationship Between "Positive Risk Balance" and "Absence of Unreasonable Risk"
Authors: Francesca Favaro
Subjects: Computers and Society (cs.CY)

International discussions on the overarching topic of how to define and quantify what a "safe enough" Automated Driving System (ADS) is are currently hinged on the question of determining the relationship between "positive risk balance" (PRB) and "absence of unreasonable risk" (AUR). In order to advance the conversation on these important safety topics at the international level, it is first important to start from a shared common understanding, grounded in clear definitions and terminology. To that end, this paper will start with an overview of the notions of PRB and AUR; it will then summarize different positions of the present debate; finally, it will conclude that two possible interpretations exist for PRB, and that failure to distinguish them can lead to misunderstanding different parties' positions. The argumentation in this paper is aimed at showing that the two interpretations for PRB can actually complement each other, but can be considered independently, and can both be subsumed within non-prescriptive guidelines toward ADS safety assurance.

[152]  arXiv:2110.10567 [pdf, other]
Title: Fingerprint recognition with embedded presentation attacks detection: are we ready?
Journal-ref: IEEE Transactions on Information Forensics and Security (2021)
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

The diffusion of fingerprint verification systems for security applications makes it urgent to investigate the embedding of software-based presentation attack detection algorithms (PAD) into such systems. Companies and institutions need to know whether such integration would make the system more "secure" and whether the technology available is ready, and, if so, at what operational working conditions. Despite significant improvements, especially by adopting deep learning approaches to fingerprint PAD, current research did not state much about their effectiveness when embedded in fingerprint verification systems. We believe that the lack of works is explained by the lack of instruments to investigate the problem, that is, modeling the cause-effect relationships when two non-zero error-free systems work together. Accordingly, this paper explores the fusion of PAD into verification systems by proposing a novel investigation instrument: a performance simulator based on the probabilistic modeling of the relationships among the Receiver Operating Characteristics (ROC) of the two individual systems when PAD and verification stages are implemented sequentially. As a matter of fact, this is the most straightforward, flexible, and widespread approach. We carry out simulations on the PAD algorithms' ROCs submitted to the most recent editions of LivDet (2017-2019), the state-of-the-art NIST Bozorth3, and the top-level Veryfinger 12 matchers. Reported experiments explore significant scenarios to get the conditions under which fingerprint matching with embedded PAD can improve, rather than degrade, the overall personal verification performance.

[153]  arXiv:2110.10568 [pdf, other]
Title: Inference Graphs for CNN Interpretation
Journal-ref: European Conference on Computer Vision (ECCV), pp. 69-84. Springer, Cham, 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Convolutional neural networks (CNNs) have achieved superior accuracy in many visual related tasks. However, the inference process through intermediate layers is opaque, making it difficult to interpret such networks or develop trust in their operation. We propose to model the network hidden layers activity using probabilistic models. The activity patterns in layers of interest are modeled as Gaussian mixture models, and transition probabilities between clusters in consecutive modeled layers are estimated. Based on maximum-likelihood considerations, nodes and paths relevant for network prediction are chosen, connected, and visualized as an inference graph. We show that such graphs are useful for understanding the general inference process of a class, as well as explaining decisions the network makes regarding specific images.

[154]  arXiv:2110.10570 [pdf, other]
Title: Behavioral Experiments for Understanding Catastrophic Forgetting
Subjects: Machine Learning (cs.LG)

In this paper we explore whether the fundamental tool of experimental psychology, the behavioral experiment, has the power to generate insight not only into humans and animals, but artificial systems too. We apply the techniques of experimental psychology to investigating catastrophic forgetting in neural networks. We present a series of controlled experiments with two-layer ReLU networks, and exploratory results revealing a new understanding of the behavior of catastrophic forgetting. Alongside our empirical findings, we demonstrate an alternative, behavior-first approach to investigating neural network phenomena.

[155]  arXiv:2110.10571 [pdf, other]
Title: CobotAR: Interaction with Robots using Omnidirectionally Projected Image and DNN-based Gesture Recognition
Comments: Accepted paper in SMC conference 2021, IEEE copyright
Subjects: Robotics (cs.RO)

Several technological solutions supported the creation of interfaces for Augmented Reality (AR) multi-user collaboration in the last years. However, these technologies require the use of wearable devices. We present CobotAR - a new AR technology to achieve the Human-Robot Interaction (HRI) by gesture recognition based on Deep Neural Network (DNN) - without an extra wearable device for the user. The system allows users to have a more intuitive experience with robotic applications using just their hands. The CobotAR system assumes the AR spatial display created by a mobile projector mounted on a 6 DoF robot. The proposed technology suggests a novel way of interaction with machines to achieve safe, intuitive, and immersive control mediated by a robotic projection system and DNN-based algorithm. We conducted the experiment with several parameters assessment during this research, which allows the users to define the positives and negatives of the new approach. The mental demand of CobotAR system is twice less than Wireless Gamepad and by 16\% less than Teach Pendant.

[156]  arXiv:2110.10572 [pdf]
Title: Random-Fuzzy Dual Interpretation of Unknown Quantity for Estimation & Recognition: with Demonstration of IMM Filter
Authors: Wei Mei
Comments: 15 pages, 11 figures, code available
Subjects: Systems and Control (eess.SY); Applications (stat.AP)

This paper is to consider the problems of estimation and recognition from the perspective of sigma-max inference (probability-possibility inference), with a focus on discovering whether some of the unknown quantities involved could be more faithfully modeled as fuzzy uncertainty. Two related key issues are addressed: 1) the random-fuzzy dual interpretation of unknown quantity being estimated; 2) the principle of selecting sigma-max operator for practical problems, such as estimation and recognition. Our perspective, conceived from definitions of randomness and fuzziness, is that continuous unknown quantity involved in estimation with inaccurate prior should be more appropriately modeled as randomness and handled by sigma inference; whereas discrete unknown quantity involved in recognition with insufficient (and inaccurate) prior could be better modeled as fuzziness and handled by max inference. The philosophy was demonstrated by an updated version of the well-known interacting multiple model (IMM) filter, for which the jump Markovian System is reformulated as a hybrid uncertainty system, with continuous state evolution modeled as usual as model-conditioned stochastic system and discrete mode transitions modeled as fuzzy system by a possibility (instead of probability) transition matrix, and hypotheses mixing is conducted by using the operation of "max" instead of "sigma". For our example of maneuvering target tracking using simulated data from both a short-range fire control radar and a long-range surveillance radar, the updated IMM filter shows significant improvement over the classic IMM filter, due to its peculiarity of hard decision of system model and a faster response to the transition of discrete mode.

[157]  arXiv:2110.10575 [pdf, other]
Title: SocialVisTUM: An Interactive Visualization Toolkit for Correlated Neural Topic Models on Social Media Opinion Mining
Comments: Demo paper accepted for publication on RANLP 2021; 8 pages, 5 figures, 1 table
Journal-ref: RANLP-2021
Subjects: Computation and Language (cs.CL)

Recent research in opinion mining proposed word embedding-based topic modeling methods that provide superior coherence compared to traditional topic modeling. In this paper, we demonstrate how these methods can be used to display correlated topic models on social media texts using SocialVisTUM, our proposed interactive visualization toolkit. It displays a graph with topics as nodes and their correlations as edges. Further details are displayed interactively to support the exploration of large text collections, e.g., representative words and sentences of topics, topic and sentiment distributions, hierarchical topic clustering, and customizable, predefined topic labels. The toolkit optimizes automatically on custom data for optimal coherence. We show a working instance of the toolkit on data crawled from English social media discussions about organic food consumption. The visualization confirms findings of a qualitative consumer research study. SocialVisTUM and its training procedures are accessible online.

[158]  arXiv:2110.10577 [pdf, other]
Title: Overview of the 2021 Key Point Analysis Shared Task
Subjects: Computation and Language (cs.CL)

We describe the 2021 Key Point Analysis (KPA-2021) shared task on key point analysis that we organized as a part of the 8th Workshop on Argument Mining (ArgMining 2021) at EMNLP 2021. We outline various approaches and discuss the results of the shared task. We expect the task and the findings reported in this paper to be relevant for researchers working on text summarization and argument mining.

[159]  arXiv:2110.10580 [pdf, other]
Title: A Learning Framework for Diffeomorphic Image Registration based on Quasi-conformal Geometry
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image registration, the process of defining meaningful correspondences between images, is essential for various image analysis tasks, especially medical imaging. Numerous learning-based methods, notably convolutional neural networks (CNNs), for deformable image registration proposed in recent years have demonstrated the feasibility and superiority of deep learning techniques for registration problems. Besides, compared to traditional algorithms' optimization scheme of the objective function for each image pair, learning-based algorithms are several orders of magnitude faster. However, these data-driven methods without proper constraint on the deformation field will easily lead to topological foldings.
To tackle this problem, We propose the quasi-conformal registration network (QCRegNet), an unsupervised learning framework, to obtain diffeomorphic 2D image registrations with large deformations based on quasi-conformal (QC) map, an orientation-preserving homeomorphism between two manifolds.
The basic idea is to design a CNN mapping image pairs to deformation fields. QCRegNet consists of the estimator network and the Beltrami solver network (BSNet). The estimator network takes image pair as input and outputs the Beltrami coefficient (BC). The BC, which captures conformal distortion of a QC map and guarantees the bijectivity, will then be input to the BSNet, a task-independent network which reconstructs the desired QC map.
Furthermore, we reduce the number of network parameters and computational complexity by utilizing Fourier approximation to compress BC. Experiments have been carried out on different data such as underwater and medical images. Registration results show that the registration accuracy is comparable to state-of-the-art methods and diffeomorphism is to a great extent guaranteed compared to other diffeomorphic registration algorithms.

[160]  arXiv:2110.10582 [pdf, other]
Title: Distributionally Robust Semi-Supervised Learning Over Graphs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications. To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently. By succinctly encoding local graph structures and features of nodes, state-of-the-art GNNs can scale linearly with the size of graph. Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes. Specifically whenever mismatches between training and testing data distribution exists, these models fail in practice. Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements. In this context, a distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations. The data distribution is considered unknown, but lies within a Wasserstein ball centered around empirical data distribution. A robust model is obtained by minimizing the worst expected loss over this ball. However, solving the emerging functional optimization problem is challenging, if not impossible. Advocating a strong duality condition, we develop a principled method that renders the problem tractable and efficiently solvable. Experiments assess the performance of the proposed method.

[161]  arXiv:2110.10583 [pdf, ps, other]
Title: Rapid computation of special values of Dirichlet $L$-functions
Authors: Fredrik Johansson (LFANT)
Subjects: Numerical Analysis (math.NA); Classical Analysis and ODEs (math.CA); Number Theory (math.NT)

We consider computing the Riemann zeta function $\zeta(s)$ and Dirichlet $L$-functions $L(s,\chi)$ to $p$-bit accuracy for large $p$. Using the approximate functional equation together with asymptotically fast computation of the incomplete gamma function, we observe that $p^{3/2+o(1)}$ bit complexity can be achieved if $s$ is an algebraic number of fixed degree and with algebraic height bounded by $O(p)$. This is an improvement over the $p^{2+o(1)}$ complexity of previously published algorithms and yields, among other things, $p^{3/2+o(1)}$ complexity algorithms for Stieltjes constants and $n^{3/2+o(1)}$ complexity algorithms for computing the $n$th Bernoulli number or the $n$th Euler number exactly.

[162]  arXiv:2110.10586 [pdf, other]
Title: Pattern Division Random Access (PDRA) for M2M Communications with Massive MIMO Systems
Subjects: Information Theory (cs.IT); Symbolic Computation (cs.SC)

In this work, we introduce the pattern-domain pilot design paradigm based on a "superposition of orthogonal-building-blocks" with significantly larger contention space to enhance the massive machine-type communications (mMTC) random access (RA) performance in massive multiple-input multiple-output (MIMO) systems.Specifically, the pattern-domain pilot is constructed based on the superposition of $L$ cyclically-shifted Zadoff-Chu (ZC) sequences. The pattern-domain pilots exhibit zero correlation values between non-colliding patterns from the same root and low correlation values between patterns from different roots. The increased contention space, i.e., from N to $\binom{N}{L}$, where $\binom{N}{L}$ denotes the number of all L-combinations of a set N, and low correlation valueslead to a significantly lower pilot collision probability without compromising excessively on channel estimation performance for mMTC RA in massive MIMO systems.We present the framework and analysis of the RA success probability of the pattern-domain based scheme with massive MIMO systems.Numerical results demonstrate that the proposed pattern division random access (PDRA) scheme achieves an appreciable performance gain over the conventional one,while preserving the existing physical layer virtually unchanged. The extension of the "superposition of orthogonal-building-blocks" scheme to "superposition of quasi-orthogonal-building-blocks" is straightforward.

[163]  arXiv:2110.10593 [pdf]
Title: Time-Domain Mapping Based Single-Channel Speech Separation With Hierarchical Constraint Training
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Single-channel speech separation is required for multi-speaker speech recognition. Recent deep learning-based approaches focused on time-domain audio separation net (TasNet) because it has superior performance and lower latency compared to the conventional time-frequency-based (T-F-based) approaches. Most of these works rely on the masking-based method that estimates a linear mapping function (mask) for each speaker. However, the other commonly used method, the mapping-based method that is less sensitive to SNR variations, is inadequately studied in the time domain. We explore the potential of the mapping-based method by introducing attention augmented DPRNN (AttnAugDPRNN) which directly approximates the clean sources from the mixture for speech separation. Permutation Invariant Training (PIT) has been a paradigm to solve the label ambiguity problem for speech separation but usually leads to suboptimal performance. To solve this problem, we propose an efficient training strategy called Hierarchical Constraint Training (HCT) to regularize the training, which could effectively improve the model performance. When using PIT, our results showed that mapping-based AttnAugDPRNN outperformed masking-based AttnAugDPRNN when the training corpus is large. Mapping-based AttnAugDPRNN with HCT significantly improved the SI-SDR by 10.1% compared to the masking-based AttnAugDPRNN without HCT.

[164]  arXiv:2110.10596 [pdf, other]
Title: Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
Comments: Accepted at NeurIPS 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We introduce the task of spatially localizing narrated interactions in videos. Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations. To achieve this goal, we propose a multilayer cross-modal attention network that enables effective optimization of a contrastive loss during training. We introduce a divided strategy that alternates between computing inter- and intra-modal attention across the visual and natural language modalities, which allows effective training via directly contrasting the two modalities' representations. We demonstrate the effectiveness of our approach by self-training on the HowTo100M instructional video dataset and evaluating on a newly collected dataset of localized described interactions in the YouCook2 dataset. We show that our approach outperforms alternative baselines, including shallow co-attention and full cross-modal attention. We also apply our approach to grounding phrases in images with weak supervision on Flickr30K and show that stacking multiple attention layers is effective and, when combined with a word-to-region loss, achieves state of the art on recall-at-one and pointing hand accuracies.

[165]  arXiv:2110.10599 [pdf, other]
Title: Video Instance Segmentation by Instance Flow Assembly
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Instance segmentation is a challenging task aiming at classifying and segmenting all object instances of specific classes. While two-stage box-based methods achieve top performances in the image domain, they cannot easily extend their superiority into the video domain. This is because they usually deal with features or images cropped from the detected bounding boxes without alignment, failing to capture pixel-level temporal consistency. We embrace the observation that bottom-up methods dealing with box-free features could offer accurate spacial correlations across frames, which can be fully utilized for object and pixel level tracking. We first propose our bottom-up framework equipped with a temporal context fusion module to better encode inter-frame correlations. Intra-frame cues for semantic segmentation and object localization are simultaneously extracted and reconstructed by corresponding decoders after a shared backbone. For efficient and robust tracking among instances, we introduce an instance-level correspondence across adjacent frames, which is represented by a center-to-center flow, termed as instance flow, to assemble messy dense temporal correspondences. Experiments demonstrate that the proposed method outperforms the state-of-the-art online methods (taking image-level input) on the challenging Youtube-VIS dataset.

[166]  arXiv:2110.10601 [pdf]
Title: Color Teams for Machine Learning Development
Comments: 8 Pages, 6 Figures
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Machine learning and software development share processes and methodologies for reliably delivering products to customers. This work proposes the use of a new teaming construct for forming machine learning teams for better combatting adversarial attackers. In cybersecurity, infrastructure uses these teams to protect their systems by using system builders and programmers to also offer more robustness to their platforms. Color teams provide clear responsibility to the individuals on each team for which part of the baseline (Yellow), attack (Red), and defense (Blue) breakout of the pipeline. Combining colors leads to additional knowledge shared across the team and more robust models built during development. The responsibilities of the new teams Orange, Green, and Purple will be outlined during this paper along with an overview of the necessary resources for these teams to be successful.

[167]  arXiv:2110.10602 [pdf, ps, other]
Title: Transductive Robust Learning Guarantees
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of adversarially robust learning in the transductive setting. For classes $\mathcal{H}$ of bounded VC dimension, we propose a simple transductive learner that when presented with a set of labeled training examples and a set of unlabeled test examples (both sets possibly adversarially perturbed), it correctly labels the test examples with a robust error rate that is linear in the VC dimension and is adaptive to the complexity of the perturbation set. This result provides an exponential improvement in dependence on VC dimension over the best known upper bound on the robust error in the inductive setting, at the expense of competing with a more restrictive notion of optimal robust error.

[168]  arXiv:2110.10603 [pdf, other]
Title: Uncovering In-DRAM RowHammer Protection Mechanisms: A New Methodology, Custom RowHammer Patterns, and Implications
Comments: This work is to appear at the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO 2021)
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)

The RowHammer vulnerability in DRAM is a critical threat to system security. To protect against RowHammer, vendors commit to security-through-obscurity: modern DRAM chips rely on undocumented, proprietary, on-die mitigations, commonly known as Target Row Refresh (TRR). At a high level, TRR detects and refreshes potential RowHammer-victim rows, but its exact implementations are not openly disclosed. Security guarantees of TRR mechanisms cannot be easily studied due to their proprietary nature.
To assess the security guarantees of recent DRAM chips, we present Uncovering TRR (U-TRR), an experimental methodology to analyze in-DRAM TRR implementations. U-TRR is based on the new observation that data retention failures in DRAM enable a side channel that leaks information on how TRR refreshes potential victim rows. U-TRR allows us to (i) understand how logical DRAM rows are laid out physically in silicon; (ii) study undocumented on-die TRR mechanisms; and (iii) combine (i) and (ii) to evaluate the RowHammer security guarantees of modern DRAM chips. We show how U-TRR allows us to craft RowHammer access patterns that successfully circumvent the TRR mechanisms employed in 45 DRAM modules of the three major DRAM vendors. We find that the DRAM modules we analyze are vulnerable to RowHammer, having bit flips in up to 99.9% of all DRAM rows.

[169]  arXiv:2110.10606 [pdf, other]
Title: Maximal Information Propagation via Lotteries
Authors: Jing Chen, Bo Li
Subjects: Computer Science and Game Theory (cs.GT)

Propagating information to more people through their friends is becoming an increasingly important technology used in domains such as blockchain, advertising, and social media. To incentivize people to broadcast the information, the designer may use a monetary rewarding scheme, which specifies who gets how much, to compensate for the propagation. Several properties are desirable for the rewarding scheme, such as budget feasible, individually rational, incentive compatible, and Sybil-proof. In this work, we design a free market with lotteries, where every participant can decide by herself how much of the reward she wants to withhold before propagating to others. We show that in the free market, the participants have a strong incentive to maximally propagate the information and all the above properties are satisfied automatically.

[170]  arXiv:2110.10611 [pdf, ps, other]
Title: Analysis of pressure-robust embedded-hybridized discontinuous Galerkin methods for the Stokes problem under minimal regularity
Subjects: Numerical Analysis (math.NA)

We present analysis of two lowest-order hybridizable discontinuous Galerkin methods for the Stokes problem, while making only minimal regularity assumptions on the exact solution. The methods under consideration have previously been shown to produce $H(\textrm{div})$-conforming and divergence-free approximate velocities. Using these properties, we derive a priori error estimates for the velocity that are independent of the pressure. These error estimates, which assume only $H^{1+s}$-regularity of the exact velocity fields for any $s \in [0, 1]$, are optimal in a discrete energy norm. Error estimates for the velocity and pressure in the $L^2$-norm are also derived in this minimal regularity setting. Our theoretical findings are supported by numerical computations.

[171]  arXiv:2110.10614 [pdf, other]
Title: Independent Natural Policy Gradient Always Converges in Markov Potential Games
Comments: 24 pages
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)

Multi-agent reinforcement learning has been successfully applied to fully-cooperative and fully-competitive environments, but little is currently known about mixed cooperative/competitive environments. In this paper, we focus on a particular class of multi-agent mixed cooperative/competitive stochastic games called Markov Potential Games (MPGs), which include cooperative games as a special case. Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well. We prove that Independent Natural Policy Gradient always converges in the last iterate using constant learning rates. The proof deviates from the existing approaches and the main challenge lies in the fact that Markov Potential Games do not have unique optimal values (as single-agent settings exhibit) so different initializations can lead to different limit point values. We complement our theoretical results with experiments that indicate that Natural Policy Gradient outperforms Policy Gradient in routing games and congestion games.

[172]  arXiv:2110.10617 [pdf, other]
Title: Colosseum: Large-Scale Wireless Experimentation Through Hardware-in-the-Loop Network Emulation
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Colosseum is an open-access and publicly-available large-scale wireless testbed for experimental research via virtualized and softwarized waveforms and protocol stacks on a fully programmable, "white-box" platform. Through 256 state-of-the-art Software-defined Radios and a Massive Channel Emulator core, Colosseum can model virtually any scenario, enabling the design, development and testing of solutions at scale in a variety of deployments and channel conditions. These Colosseum radio-frequency scenarios are reproduced through high-fidelity FPGA-based emulation with finite-impulse response filters. Filters model the taps of desired wireless channels and apply them to the signals generated by the radio nodes, faithfully mimicking the conditions of real-world wireless environments. In this paper we describe the architecture of Colosseum and its experimentation and emulation capabilities. We then demonstrate the effectiveness of Colosseum for experimental research at scale through exemplary use cases including prevailing wireless technologies (e.g., cellular and Wi-Fi) in spectrum sharing and unmanned aerial vehicle scenarios. A roadmap for Colosseum future updates concludes the paper.

[173]  arXiv:2110.10623 [pdf, other]
Title: Chaos inspired Particle Swarm Optimization with Levy Flight for Genome Sequence Assembly
Comments: 18 pages, 10 figures, 3 tables
Subjects: Neural and Evolutionary Computing (cs.NE)

With the advent of Genome Sequencing, the field of Personalized Medicine has been revolutionized. From drug testing and studying diseases and mutations to clan genomics, studying the genome is required. However, genome sequence assembly is a very complex combinatorial optimization problem of computational biology. PSO is a popular meta-heuristic swarm intelligence optimization algorithm, used to solve combinatorial optimization problems. In this paper, we propose a new variant of PSO to address this permutation-optimization problem. PSO is integrated with the Chaos and Levy Flight (A random walk algorithm) to effectively balance the exploration and exploitation capability of the algorithm. Empirical experiments are conducted to evaluate the performance of the proposed method in comparison to the other variants of the PSO proposed in the literature. The analysis is conducted on four DNA coverage datasets. The conducted analysis demonstrates that the proposed model attain a better performance with better reliability and consistency in comparison to other competitive methods in all cases.

[174]  arXiv:2110.10628 [pdf, other]
Title: PyPSA meets Africa: Developing an open source electricity network model of the African continent
Subjects: Systems and Control (eess.SY)

Electricity network modelling and grid simulations form a key enabling element for the integration of newer and cleaner technologies such as renewable energy generation and electric vehicles into the existing grid and energy system infrastructure. This paper reviews the models of the African electricity systems and highlights the gaps in the open model landscape. Using PyPSA (an open Power System Analysis package), the paper outlines the pathway to a fully open model and data to increase the transparency in the African electricity system planning. Optimisation and modelling can reveal viable pathways to a sustainable energy system, aiding strategic planning for upgrades and policy-making for accelerated integration of renewable energy generation and smart grid technologies such as battery storage in Africa.

[175]  arXiv:2110.10632 [pdf, other]
Title: More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.

[176]  arXiv:2110.10639 [pdf, other]
Title: Semi-supervised Domain Adaptation for Semantic Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning approaches for semantic segmentation rely primarily on supervised learning approaches and require substantial efforts in producing pixel-level annotations. Further, such approaches may perform poorly when applied to unseen image domains. To cope with these limitations, both unsupervised domain adaptation (UDA) with full source supervision but without target supervision and semi-supervised learning (SSL) with partial supervision have been proposed. While such methods are effective at aligning different feature distributions, there is still a need to efficiently exploit unlabeled data to address the performance gap with respect to fully-supervised methods. In this paper we address semi-supervised domain adaptation (SSDA) for semantic segmentation, where a large amount of labeled source data as well as a small amount of labeled target data are available. We propose a novel and effective two-step semi-supervised dual-domain adaptation (SSDDA) approach to address both cross- and intra-domain gaps in semantic segmentation. The proposed framework is comprised of two mixing modules. First, we conduct a cross-domain adaptation via an image-level mixing strategy, which learns to align the distribution shift of features between the source data and target data. Second, intra-domain adaptation is achieved using a separate student-teacher network which is built to generate category-level data augmentation by mixing unlabeled target data in a way that respects predicted object boundaries. We demonstrate that the proposed approach outperforms state-of-the-art methods on two common synthetic-to-real semantic segmentation benchmarks. An extensive ablation study is provided to further validate the effectiveness of our approach.

[177]  arXiv:2110.10641 [pdf, ps, other]
Title: Anaphora and Ellipsis in Lambek Calculus with a Relevant Modality: Syntax and Semantics
Subjects: Logic in Computer Science (cs.LO)

Lambek calculus with a relevant modality $!\mathbf{L^*}$ of arXiv:1601.06303 syntactically resolves parasitic gaps in natural language. It resembles the Lambek calculus with anaphora $\mathbf{LA}$ of (J\"ager, 1998) and the Lambek calculus with controlled contraction, $\mathbf{L}_{\Diamond}$, of arXiv:1905.01647v1 which deal with anaphora and ellipsis. What all these calculi add to Lambek calculus is a copying and moving behaviour. Distributional semantics is a subfield of Natural Language Processing that uses vector space semantics for words via co-occurrence statistics in large corpora of data. Compositional vector space semantics for Lambek Calculi are obtained via the DisCoCat models arXiv:1003.4394v1. $\mathbf{LA}$ does not have a vector space semantics and the semantics of $\mathbf{L}_{\Diamond}$ is not compositional. Previously, we developed a DisCoCat semantics for $!\mathbf{L^*}$ and focused on the parasitic gap applications. In this paper, we use the vector space instance of that general semantics and show how one can also interpret anaphora, ellipsis, and for the first time derive the sloppy vs strict vector readings of ambiguous anaphora with ellipsis cases. The base of our semantics is tensor algebras and their finite dimensional variants: the Fermionic Fock spaces of Quantum Mechanics. We implement our model and experiment with the ellipsis disambiguation task of arXiv:1905.01647.

[178]  arXiv:2110.10655 [pdf, other]
Title: Adversarial Socialbot Learning via Multi-Agent Deep Hierarchical Reinforcement Learning
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Socialbots are software-driven user accounts on social platforms, acting autonomously (mimicking human behavior), with the aims to influence the opinions of other users or spread targeted misinformation for particular goals. As socialbots undermine the ecosystem of social platforms, they are often considered harmful. As such, there have been several computational efforts to auto-detect the socialbots. However, to our best knowledge, the adversarial nature of these socialbots has not yet been studied. This begs a question "can adversaries, controlling socialbots, exploit AI techniques to their advantage?" To this question, we successfully demonstrate that indeed it is possible for adversaries to exploit computational learning mechanism such as reinforcement learning (RL) to maximize the influence of socialbots while avoiding being detected. We first formulate the adversarial socialbot learning as a cooperative game between two functional hierarchical RL agents. While one agent curates a sequence of activities that can avoid the detection, the other agent aims to maximize network influence by selectively connecting with right users. Our proposed policy networks train with a vast amount of synthetic graphs and generalize better than baselines on unseen real-life graphs both in terms of maximizing network influence (up to +18%) and sustainable stealthiness (up to +40% undetectability) under a strong bot detector (with 90% detection accuracy). During inference, the complexity of our approach scales linearly, independent of a network's structure and the virality of news. This makes our approach a practical adversarial attack when deployed in a real-life setting.

[179]  arXiv:2110.10659 [pdf, other]
Title: OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowing library developers to enhance performance of their applications by harnessing the computing power offered by High Performance Computing (HPC) platforms. Efficient communication is key to scaling applications on parallel systems, which is typically enabled by the Message Passing Interface (MPI) standard and compliant libraries on HPC hardware. mpi4py is a Python-based communication library that provides an MPI-like interface for Python applications allowing application developers to utilize parallel processing elements including GPUs. However, there is currently no benchmark suite to evaluate communication performance of mpi4py -- and Python MPI codes in general -- on modern HPC systems. In order to bridge this gap, we propose OMB-Py -- Python extensions to the open-source OSU Micro-Benchmark (OMB) suite -- aimed to evaluate communication performance of MPI-based parallel applications in Python. To the best of our knowledge, OMB-Py is the first communication benchmark suite for parallel Python applications. OMB-Py consists of a variety of point-to-point and collective communication benchmark tests that are implemented for a range of popular Python libraries including NumPy, CuPy, Numba, and PyCUDA. We also provide Python implementation for several distributed ML algorithms as benchmarks to understand the potential gain in performance for ML/DL workloads. Our evaluation reveals that mpi4py introduces a small overhead when compared to native MPI libraries. We also evaluate the ML/DL workloads and report up to 106x speedup on 224 CPU cores compared to sequential execution. We plan to publicly release OMB-Py to benefit Python HPC community.

[180]  arXiv:2110.10660 [pdf, other]
Title: Event-triggered Control for Nonlinear Systems with Center Manifolds
Comments: Submitted to IEEE Transactions on Automatic Control as a Full paper (Under review). 16 Pages, 4 Figures
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

In this work, we consider the problem of event-triggered implementation of control laws designed for the local stabilization of nonlinear systems with center manifolds. We propose event-triggering conditions which are derived from a local input-to-state stability characterization of such systems. The triggering conditions ensure local ultimate boundedness of the trajectories and the existence of a uniform positive lower bound for the inter-event times. The ultimate bound can be made arbitrarily small, but by allowing for smaller inter-event times. Under certain assumptions on the controller structure, local asymptotic stability of the origin is also guaranteed. Two sets of triggering conditions are proposed, that cater to the cases where the exact center manifold and only an approximation of the center manifold is computable. The closed-loop system exhibits some desirable properties when the exact knowledge of the center manifold is employed in checking the triggering conditions. Three illustrative examples that explore different scenarios are presented and the applicability of the proposed methods is demonstrated. The third example concerns the event-triggered implementation of a position stabilizing controller for the open-loop unstable Mobile Inverted Pendulum (MIP) robot.

[181]  arXiv:2110.10661 [pdf, other]
Title: SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark
Comments: NeurIPS 2021. 14 pages, 8 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Existing work in language grounding typically study single environments. How do we build unified models that apply across multiple environments? We propose the multi-environment Symbolic Interactive Language Grounding benchmark (SILG), which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack), as well as symbolic counterparts of visual worlds that require interpreting rich natural language with respect to complex scenes (ALFWorld, Touchdown). Together, these environments provide diverse grounding challenges in richness of observation space, action space, language specification, and plan complexity. In addition, we propose the first shared model architecture for RL on these environments, and evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG. Our shared architecture achieves comparable performance to environment-specific architectures. Moreover, we find that many recent modelling advances do not result in significant gains on environments other than the one they were designed for. This highlights the need for a multi-environment benchmark. Finally, the best models significantly underperform humans on SILG, which suggests ample room for future work. We hope SILG enables the community to quickly identify new methodologies for language grounding that generalize to a diverse set of environments and their associated challenges.

[182]  arXiv:2110.10666 [pdf, other]
Title: Efficient Consensus-Free Weight Reassignment for Atomic Storage
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Weighted voting is a conventional approach to improving the performance of replicated systems based on commonly-used majority quorum systems in heterogeneous environments. In long-lived systems, a weight reassignment protocol is required to reassign weights over time in order to accommodate performance variations accordingly. The weight reassignment protocol should be consensus-free in asynchronous failure-prone systems because of the impossibility of solving consensus in such systems. This paper presents an efficient consensus-free weight reassignment protocol for atomic storage systems in heterogeneous, dynamic, and asynchronous message-passing systems. An experimental evaluation shows that the proposed protocol improves the performance of atomic read/write storage implemented by majority quorum systems compared with previous solutions.

[183]  arXiv:2110.10668 [pdf, other]
Title: Evaluating the Evaluation Metrics for Style Transfer: A Case Study in Multilingual Formality Transfer
Comments: EMNLP 2021
Subjects: Computation and Language (cs.CL)

While the field of style transfer (ST) has been growing rapidly, it has been hampered by a lack of standardized practices for automatic evaluation. In this paper, we evaluate leading ST automatic metrics on the oft-researched task of formality style transfer. Unlike previous evaluations, which focus solely on English, we expand our focus to Brazilian-Portuguese, French, and Italian, making this work the first multilingual evaluation of metrics in ST. We outline best practices for automatic evaluation in (formality) style transfer and identify several models that correlate well with human judgments and are robust across languages. We hope that this work will help accelerate development in ST, where human evaluation is often challenging to collect.

[184]  arXiv:2110.10674 [pdf, other]
Title: SEA: Graph Shell Attention in Graph Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

A common issue in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes' representations of the input graph align with each other and become indiscernible. Recently, it has been shown that increasing a model's complexity by integrating an attention mechanism yields more expressive architectures. This is majorly contributed to steering the nodes' representations only towards nodes that are more informative than others. Transformer models in combination with GNNs result in architectures including Graph Transformer Layers (GTL), where layers are entirely based on the attention operation. However, the calculation of a node's representation is still restricted to the computational working flow of a GNN. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes' representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph Shell Attention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node's representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results compared to state-of-the-art models.

[185]  arXiv:2110.10678 [pdf, other]
Title: Resilient Time-Varying Formation Tracking for Mobile Robot Networks under Deception Attacks on Positioning
Comments: 12 pages, 13 figures
Subjects: Robotics (cs.RO)

This paper investigates the resilient control, analysis, recovery, and operation of mobile robot networks in time-varying formation tracking under deception attacks on global positioning. Local and global tracking control algorithms are presented to ensure redundancy of the mobile robot network and to retain the desired functionality for better resilience. Lyapunov stability analysis is utilized to show the boundedness of the formation tracking error and the stability of the network under various attack modes. A performance index is designed to compare the efficiency of the proposed formation tracking algorithms in situations with or without positioning attacks. Subsequently, a communication-free decentralized cooperative localization approach based on extended information filters is presented for positioning estimate recovery where the identification of the positioning attacks is based on Kullback-Leibler divergence. A gain-tuning resilient operation is proposed to strategically synthesize the formation control and cooperative localization for accurate and rapid system recovery from positioning attacks. The proposed methods are tested using both numerical simulation and experimental validation with a team of quadrotors.

[186]  arXiv:2110.10679 [pdf]
Title: Using a market basket analysis in tourism studies
Comments: 36 pages, 8 figures
Journal-ref: Vavpoti\v{c} D, Knavs K, Kne\v{z}evi\'c Cvelbar L, Using a market basket analysis in tourism studies, Tourism Economics (First Published July 30, 2020), \c{opyright} 2020 (SAGE Publications)
Subjects: Social and Information Networks (cs.SI)

Understanding tourist visitation patterns is crucial for decision makers in order to create smart tourism industry. A growing body of tourism research uses geo-location data in order to better understand tourism demand. In this paper, we present a new approach based on a market basket analysis. This approach uses geo-location data shared by tourists on tourism platforms in order to bundle the range of available tourism services and understand which experiences are consumed together. The approach was tested on the case of Vienna, Austria. Based on our analyses we argue that the proposed approach has potential for use at the destination level and provides relevant information on tourism demand patterns important for smart tourism decision-making.

[187]  arXiv:2110.10704 [pdf, other]
Title: A Self-Explainable Stylish Image Captioning Framework via Multi-References
Comments: arXiv admin note: substantial text overlap with arXiv:2103.11186
Subjects: Computation and Language (cs.CL)

In this paper, we propose to build a stylish image captioning model through a Multi-style Multi modality mechanism (2M). We demonstrate that with 2M, we can build an effective stylish captioner and that multi-references produced by the model can also support explaining the model through identifying erroneous input features on faulty examples. We show how this 2M mechanism can be used to build stylish captioning models and show how these models can be utilized to provide explanations of likely errors in the models.

[188]  arXiv:2110.10713 [pdf, other]
Title: PPFS: Predictive Permutation Feature Selection
Comments: 7 pages. For the implementation of this work, see this https URL
Subjects: Machine Learning (cs.LG)

We propose Predictive Permutation Feature Selection (PPFS), a novel wrapper-based feature selection method based on the concept of Markov Blanket (MB). Unlike previous MB methods, PPFS is a universal feature selection technique as it can work for both classification as well as regression tasks on datasets containing categorical and/or continuous features. We propose Predictive Permutation Independence (PPI), a new Conditional Independence (CI) test, which enables PPFS to be categorised as a wrapper feature selection method. This is in contrast to current filter based MB feature selection techniques that are unable to harness the advancements in supervised algorithms such as Gradient Boosting Machines (GBM). The PPI test is based on the knockoff framework and utilizes supervised algorithms to measure the association between an individual or a set of features and the target variable. We also propose a novel MB aggregation step that addresses the issue of sample inefficiency. Empirical evaluations and comparisons on a large number of datasets demonstrate that PPFS outperforms state-of-the-art Markov blanket discovery algorithms as well as, well-known wrapper methods. We also provide a sketch of the proof of correctness of our method. Implementation of this work is available at \url{https://github.com/atif-hassan/PyImpetus}

[189]  arXiv:2110.10714 [pdf, other]
Title: Auction Design through Multi-Agent Learning in Peer-to-Peer Energy Trading
Subjects: Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)

Distributed energy resources (DERs), such as rooftop solar panels, are growing rapidly and are reshaping power systems. To promote DERs, feed-in-tariff (FIT) is usually adopted by utilities to pay DER owners certain fixed rates for supplying energy to the grid. An alternative to FIT is a market-based approach; that is, consumers and DER owners trade energy in an auction-based peer-to-peer (P2P) market, and the rates are determined based on supply and demand. However, the auction complexity and market participants' bounded rationality may invalidate many well-established theories on auction design and hinder market development. To address the challenges, we propose an automated bidding framework based on multi-agent, multi-armed bandit learning for repeated auctions, which aims to minimize each bidder's cumulative regret. Numerical results indicate convergence of such a multi-agent learning game to a steady-state. Being particularly interested in auction designs, we have applied the framework to four different implementations of repeated double-side auctions to compare their market outcomes. While it is difficult to pick a clear winner, $k$-double auction (a variant of uniform pricing auction) and McAfee auction (a variant of Vickrey double-auction) appear to perform well in general, with their respective strengths and weaknesses.

[190]  arXiv:2110.10718 [pdf, ps, other]
Title: Bootstrapping confidence in future safety based on past safe operation
Comments: 15 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

With autonomous vehicles (AVs), a major concern is the inability to give meaningful quantitative assurance of safety, to the extent required by society - e.g. that an AV must be at least as safe as a good human driver - before that AV is in extensive use. We demonstrate an approach to achieving more moderate, but useful, confidence, e.g., confidence of low enough probability of causing accidents in the early phases of operation. This formalises mathematically the common approach of operating a system on a limited basis in the hope that mishap-free operation will confirm one's confidence in its safety and allow progressively more extensive operation: a process of "bootstrapping" of confidence. Translating that intuitive approach into theorems shows: (1) that it is substantially sound in the right circumstances, and could be a good method for deciding about the early deployment phase for an AV; (2) how much confidence can be rightly derived from such a "cautious deployment" approach, so that we can avoid over-optimism; (3) under which conditions our sound formulas for future confidence are applicable; (4) thus, which analyses of the concrete situations, and/or constraints on practice, are needed in order to enjoy the advantages of provably correct confidence in adequate future safety.

[191]  arXiv:2110.10720 [pdf, ps, other]
Title: Privacy in Open Search: A Review of Challenges and Solutions
Comments: Paper accepted at OSSYM 2021 - Third International Open Search Symposium
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Privacy is of worldwide concern regarding activities and processes that include sensitive data. For this reason, many countries and territories have been recently approving regulations controlling the extent to which organizations may exploit data provided by people. Artificial intelligence areas, such as machine learning and natural language processing, have already successfully employed privacy-preserving mechanisms in order to safeguard data privacy in a vast number of applications. Information retrieval (IR) is likewise prone to privacy threats, such as attacks and unintended disclosures of documents and search history, which may cripple the security of users and be penalized by data protection laws. This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data. Our contribution is threefold: firstly, we present an overview of privacy threats to IR tasks; secondly, we discuss applicable privacy-preserving mechanisms which may be employed in solutions to restrain privacy hazards; finally, we bring insights on the tradeoffs between privacy preservation and utility performance for IR tasks.

[192]  arXiv:2110.10725 [pdf, ps, other]
Title: An Invariance Principle for the Multi-slice, with Applications
Subjects: Computational Complexity (cs.CC); Combinatorics (math.CO)

Given an alphabet size $m\in\mathbb{N}$ thought of as a constant, and $\vec{k} = (k_1,\ldots,k_m)$ whose entries sum of up $n$, the $\vec{k}$-multi-slice is the set of vectors $x\in [m]^n$ in which each symbol $i\in [m]$ appears precisely $k_i$ times. We show an invariance principle for low-degree functions over the multi-slice, to functions over the product space $([m]^n,\mu^n)$ in which $\mu(i) = k_i/n$. This answers a question raised by Filmus et al.
As applications of the invariance principle, we show:
1. An analogue of the "dictatorship test implies computational hardness" paradigm for problems with perfect completeness, for a certain class of dictatorship tests. Our computational hardness is proved assuming a recent strengthening of the Unique-Games Conjecture, called the Rich $2$-to-$1$ Games Conjecture. Using this analogue, we show that assuming the Rich $2$-to-$1$ Games Conjecture, (a) there is an $r$-ary CSP $\mathcal{P}_r$ for which it is NP-hard to distinguish satisfiable instances of the CSP and instances that are at most $\frac{2r+1}{2^r} + o(1)$ satisfiable, and (b) hardness of distinguishing $3$-colorable graphs, and graphs that do not contain an independent set of size $o(1)$.
2. A reduction of the problem of studying expectations of products of functions on the multi-slice to studying expectations of products of functions on correlated, product spaces. In particular, we are able to deduce analogues of the Gaussian bounds from \cite{MosselGaussian} for the multi-slice.
3. In a companion paper, we show further applications of our invariance principle in extremal combinatorics, and more specifically to proving removal lemmas of a wide family of hypergraphs $H$ called $\zeta$-forests, which is a natural extension of the well-studied case of matchings.

[193]  arXiv:2110.10729 [pdf, other]
Title: Part-X: A Family of Stochastic Algorithms for Search-Based Test Generation with Probabilistic Guarantees
Comments: 25 pages, 7 Figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Systems and Control (eess.SY)

Requirements driven search-based testing (also known as falsification) has proven to be a practical and effective method for discovering erroneous behaviors in Cyber-Physical Systems. Despite the constant improvements on the performance and applicability of falsification methods, they all share a common characteristic. Namely, they are best-effort methods which do not provide any guarantees on the absence of erroneous behaviors (falsifiers) when the testing budget is exhausted. The absence of finite time guarantees is a major limitation which prevents falsification methods from being utilized in certification procedures. In this paper, we address the finite-time guarantees problem by developing a new stochastic algorithm. Our proposed algorithm not only estimates (bounds) the probability that falsifying behaviors exist, but also it identifies the regions where these falsifying behaviors may occur. We demonstrate the applicability of our approach on standard benchmark functions from the optimization literature and on the F16 benchmark problem.

[194]  arXiv:2110.10734 [pdf, other]
Title: Self-Supervision and Spatial-Sequential Attention Based Loss for Multi-Person Pose Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Bottom-up based multi-person pose estimation approaches use heatmaps with auxiliary predictions to estimate joint positions and belonging at one time. Recently, various combinations between auxiliary predictions and heatmaps have been proposed for higher performance, these predictions are supervised by the corresponding L2 loss function directly. However, the lack of more explicit supervision results in low features utilization and contradictions between predictions in one model. To solve these problems, this paper proposes (i) a new loss organization method which uses self-supervised heatmaps to reduce prediction contradictions and spatial-sequential attention to enhance networks' features extraction; (ii) a new combination of predictions composed by heatmaps, Part Affinity Fields (PAFs) and our block-inside offsets to fix pixel-level joints positions and further demonstrates the effectiveness of proposed loss function. Experiments are conducted on the MS COCO keypoint dataset and adopting OpenPose as the baseline model. Our method outperforms the baseline overall. On the COCO verification dataset, the mAP of OpenPose trained with our proposals outperforms the OpenPose baseline by over 5.5%.

[195]  arXiv:2110.10735 [pdf, other]
Title: Dynamic Bottleneck for Robust Self-Supervised Exploration
Comments: NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e.g., white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.

[196]  arXiv:2110.10739 [pdf, other]
Title: Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The recently-proposed mixture invariant training (MixIT) is an unsupervised method for training single-channel sound separation models in the sense that it does not require ground-truth isolated reference sources. In this paper, we investigate using MixIT to adapt a separation model on real far-field overlapping reverberant and noisy speech data from the AMI Corpus. The models are tested on real AMI recordings containing overlapping speech, and are evaluated subjectively by human listeners. To objectively evaluate our models, we also devise a synthetic AMI test set. For human evaluations on real recordings, we also propose a modification of the standard MUSHRA protocol to handle imperfect reference signals, which we call MUSHIRA. Holding network architectures constant, we find that a fine-tuned semi-supervised model yields the largest SI-SNR improvement, PESQ scores, and human listening ratings across synthetic and real datasets, outperforming unadapted generalist models trained on orders of magnitude more data. Our results show that unsupervised learning through MixIT enables model adaptation on real-world unlabeled spontaneous speech recordings.

[197]  arXiv:2110.10741 [pdf, other]
Title: Class Incremental Online Streaming Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

A wide variety of methods have been developed to enable lifelong learning in conventional deep neural networks. However, to succeed, these methods require a `batch' of samples to be available and visited multiple times during training. While this works well in a static setting, these methods continue to suffer in a more realistic situation where data arrives in \emph{online streaming manner}. We empirically demonstrate that the performance of current approaches degrades if the input is obtained as a stream of data with the following restrictions: $(i)$ each instance comes one at a time and can be seen only once, and $(ii)$ the input data violates the i.i.d assumption, i.e., there can be a class-based correlation. We propose a novel approach (CIOSL) for the class-incremental learning in an \emph{online streaming setting} to address these challenges. The proposed approach leverages implicit and explicit dual weight regularization and experience replay. The implicit regularization is leveraged via the knowledge distillation, while the explicit regularization incorporates a novel approach for parameter regularization by learning the joint distribution of the buffer replay and the current sample. Also, we propose an efficient online memory replay and replacement buffer strategy that significantly boosts the model's performance. Extensive experiments and ablation on challenging datasets show the efficacy of the proposed method.

[198]  arXiv:2110.10746 [pdf, other]
Title: Better than Average: Paired Evaluation of NLP Systems
Comments: Published in ACL 2021 (long paper)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Evaluation in NLP is usually done by comparing the scores of competing systems independently averaged over a common set of test instances. In this work, we question the use of averages for aggregating evaluation scores into a final number used to decide which system is best, since the average, as well as alternatives such as the median, ignores the pairing arising from the fact that systems are evaluated on the same test instances. We illustrate the importance of taking the instance-level pairing of evaluation scores into account and demonstrate, both theoretically and empirically, the advantages of aggregation methods based on pairwise comparisons, such as the Bradley-Terry (BT) model, a mechanism based on the estimated probability that a given system scores better than another on the test set. By re-evaluating 296 real NLP evaluation setups across four tasks and 18 evaluation metrics, we show that the choice of aggregation mechanism matters and yields different conclusions as to which systems are state of the art in about 30% of the setups. To facilitate the adoption of pairwise evaluation, we release a practical tool for performing the full analysis of evaluation scores with the mean, median, BT, and two variants of BT (Elo and TrueSkill), alongside functionality for appropriate statistical testing.

[199]  arXiv:2110.10749 [pdf, other]
Title: A domain decomposition solution of the Stokes-Darcy system in 3D based on boundary integrals
Authors: Svetlana Tlupova
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)

A framework is developed for a robust and highly accurate numerical solution of the coupled Stokes-Darcy system in three dimensions. The domain decomposition method is based on a Dirichlet-Neumann type splitting of the interface conditions and solving separate Stokes and Darcy problems iteratively. Second kind boundary integral equations are formulated for each problem. The integral equations use a smoothing of the kernels that achieves high accuracy on the boundary, and a straightforward quadrature to discretize the integrals. Numerical results demonstrate the convergence, accuracy, and dependence on parameter values of the iterative solution for a problem of viscous flow around a porous sphere with a known analytical solution, as well as more general surfaces.

[200]  arXiv:2110.10757 [pdf, other]
Title: TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement
Comments: submitted to ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), which is a recurrent neural network (RNN) augmented with self-attention. Next, an ARN is introduced along the spatial dimension for spatial context aggregation. TPARN is designed as a multiple-input and multiple-output architecture to enhance all input channels simultaneously. Experimental results demonstrate the superiority of TPARN over existing state-of-the-art approaches.

[201]  arXiv:2110.10759 [pdf, other]
Title: Balanced Allocations: Caching and Packing, Twinning and Thinning
Comments: 76 pages, 7 figures, 2 tables
Journal-ref: SODA 2022
Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Probability (math.PR)

We consider the sequential allocation of $m$ balls (jobs) into $n$ bins (servers) by allowing each ball to choose from some bins sampled uniformly at random. The goal is to maintain a small gap between the maximum load and the average load. In this paper, we present a general framework that allows us to analyze various allocation processes that slightly prefer allocating into underloaded, as opposed to overloaded bins. Our analysis covers several natural instances of processes, including:
The Caching process (a.k.a. memory protocol) as studied by Mitzenmacher, Prabhakar and Shah (2002): At each round we only take one bin sample, but we also have access to a cache in which the most recently used bin is stored. We place the ball into the least loaded of the two.
The Packing process: At each round we only take one bin sample. If the load is below some threshold (e.g., the average load), then we place as many balls until the threshold is reached; otherwise, we place only one ball.
The Twinning process: At each round, we only take one bin sample. If the load is below some threshold, then we place two balls; otherwise, we place only one ball.
The Thinning process as recently studied by Feldheim and Gurel-Gurevich (2021): At each round, we first take one bin sample. If its load is below some threshold, we place one ball; otherwise, we place one ball into a $\textit{second}$ bin sample.
As we demonstrate, our general framework implies for all these processes a gap of $\mathcal{O}(\log n)$ between the maximum load and average load, even when an arbitrary number of balls $m \geq n$ are allocated (heavily loaded case). Our analysis is inspired by a previous work of Peres, Talwar and Wieder (2010) for the $(1+\beta)$-process, however here we rely on the interplay between different potential functions to prove stabilization.

[202]  arXiv:2110.10762 [pdf, other]
Title: Asynchronous parareal time discretization for partial differential equations
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)

Asynchronous iterations are more and more investigated for both scaling and fault-resilience purpose on high performance computing platforms. While so far, they have been exclusively applied within space domain decomposition frameworks, this paper advocates a novel application direction targeting time-decomposed time-parallel approaches. Specifically, an asynchronous iterative model is derived from the Parareal scheme, for which convergence and speedup analysis are then conducted. It turned out that Parareal and async-Parareal feature very close convergence conditions, asymptotically equivalent, including the finite-time termination property. Based on a computational cost model aware of unsteady communication delays, our speedup analysis shows the potential performance gain from asynchronous iterations, which is confirmed by some experimental case of heat evolution on a homogeneous supercomputer. This primary work clearly suggests possible further benefits from asynchronous iterations.

[203]  arXiv:2110.10765 [pdf, other]
Title: Accelerating quantum many-body configuration interaction with directives
Comments: 22 pages, 7 figures, 11 code listings, WACCPD@SC21
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Engineering, Finance, and Science (cs.CE); Mathematical Software (cs.MS); Performance (cs.PF); Nuclear Theory (nucl-th)

Many-Fermion Dynamics-nuclear, or MFDn, is a configuration interaction (CI) code for nuclear structure calculations. It is a platform-independent Fortran 90 code using a hybrid MPI+X programming model. For CPU platforms the application has a robust and optimized OpenMP implementation for shared memory parallelism. As part of the NESAP application readiness program for NERSC's latest Perlmutter system, MFDn has been updated to take advantage of accelerators. The current mainline GPU port is based on OpenACC. In this work we describe some of the key challenges of creating an efficient GPU implementation. Additionally, we compare the support of OpenMP and OpenACC on AMD and NVIDIA GPUs.

[204]  arXiv:2110.10769 [pdf, other]
Title: RegGuard: Leveraging CPU Registers for Mitigation of Control- and Data-Oriented Attacks
Comments: 15 pages with 8 figures
Subjects: Cryptography and Security (cs.CR)

CPU registers are small discrete storage units, used to hold temporary data and instructions within the CPU. Registers are not addressable in the same way memory is, which makes them immune from memory attacks and manipulation by other means. In this paper, we take advantage of this to provide a protection mechanism for critical program data; both active local variables and control objects on the stack. This protection effectively eliminates the threat of control- and data-oriented attacks, even by adversaries with full knowledge of the active stack. Our solution RegGuard, is a compiler register allocation strategy that utilises the available CPU registers to hold critical variables during execution. Unlike conventional allocations schemes, RegGuard prioritises the security significance of a program variable over its expected performance gain. Our scheme can deal effectively with saved registers to the stack, i.e., when the compiler needs to free up registers to make room for the variables of a new function call. With RegGuard, critical data objects anywhere on the entire stack are effectively protected from corruption, even by adversaries with arbitrary read and write access. While our primary design focus is on security, performance is very important for a scheme to be adopted in practice. RegGuard is still benefiting from the performance gain normally associated with register allocations, and the overhead is within a few percent of other unsecured register allocation schemes for most cases. We present detailed experiments that showcase the performance of RegGuard using different benchmark programs and the C library on ARM64 platform.

[205]  arXiv:2110.10772 [pdf]
Title: Closed-loop Feedback Registration for Consecutive Images of Moving Flexible Targets
Authors: Rui Ma, Xian Du
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Advancement of imaging techniques enables consecutive image sequences to be acquired for quality monitoring of manufacturing production lines. Registration for these image sequences is essential for in-line pattern inspection and metrology, e.g., in the printing process of flexible electronics. However, conventional image registration algorithms cannot produce accurate results when the images contain many similar and deformable patterns in the manufacturing process. Such a failure originates from a fact that the conventional algorithms only use the spatial and pixel intensity information for registration. Considering the nature of temporal continuity and consecution of the product images, in this paper, we propose a closed-loop feedback registration algorithm for matching and stitching the deformable printed patterns on a moving flexible substrate. The algorithm leverages the temporal and spatial relationships of the consecutive images and the continuity of the image sequence for fast, accurate, and robust point matching. Our experimental results show that our algorithm can find more matching point pairs with a lower root mean squared error (RMSE) compared to other state-of-the-art algorithms while offering significant improvements to running time.

[206]  arXiv:2110.10774 [pdf, other]
Title: SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation
Comments: this paper was accepted by EMNLP2021-findings
Subjects: Computation and Language (cs.CL)

Generating texts in scientific papers requires not only capturing the content contained within the given input but also frequently acquiring the external information called \textit{context}. We push forward the scientific text generation by proposing a new task, namely \textbf{context-aware text generation} in the scientific domain, aiming at exploiting the contributions of context in generated texts. To this end, we present a novel challenging large-scale \textbf{Sci}entific Paper Dataset for Conte\textbf{X}t-Aware Text \textbf{Gen}eration (SciXGen), consisting of well-annotated 205,304 papers with full references to widely-used objects (e.g., tables, figures, algorithms) in a paper. We comprehensively benchmark, using state-of-the-arts, the efficacy of our newly constructed SciXGen dataset in generating description and paragraph. Our dataset and benchmarks will be made publicly available to hopefully facilitate the scientific text generation research.

[207]  arXiv:2110.10775 [pdf, other]
Title: Reduced Basis Approximations of Parameterized Dynamical Partial Differential Equations via Neural Networks
Comments: 21 pages, 10 figures
Subjects: Numerical Analysis (math.NA)

Projection-based reduced order models are effective at approximating parameter-dependent differential equations that are parametrically separable. When parametric separability is not satisfied, which occurs in both linear and nonlinear problems, projection-based methods fail to adequately reduce the computational complexity. Devising alternative reduced order models is crucial for obtaining efficient and accurate approximations to expensive high-fidelity models. In this work, we develop a time-stepping procedure for dynamical parameter-dependent problems, in which a neural-network is trained to propagate the coefficients of a reduced basis expansion. This results in an online stage with a computational cost independent of the size of the underlying problem. We demonstrate our method on several parabolic partial differential equations, including a problem that is not parametrically separable.

[208]  arXiv:2110.10777 [pdf, other]
Title: Learning controllers for performance through LMI regions
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)

In an open-loop experiment, an input sequence is applied to an unknown linear time-invariant system (in continuous or discrete time) affected also by an unknown-but-bounded disturbance sequence (with an energy or instantaneous bound); the corresponding state sequence is measured. The goal is to design directly from the input and state sequences a controller that enforces a certain performance specification on the transient behaviour of the unknown system. The performance specification is expressed through a subset of the complex plane where closed-loop eigenvalues need to belong, a so called LMI region. For this control design problem, we provide here convex programs to enforce the performance specification from data in the form of linear matrix inequalities (LMI). For generic LMI regions, these are sufficient conditions to assign the eigenvalues within the LMI region for all possible dynamics consistent with data, and become necessary and sufficient conditions for special LMI regions. In this way, we extend classical model-based conditions from a seminal work in the literature to the setting of data-driven control from noisy data. Through two numerical examples, we investigate how these data-based conditions compare with each other.

[209]  arXiv:2110.10778 [pdf, other]
Title: Contrastive Document Representation Learning with Graph Attention Networks
Comments: Findings of EMNLP 2021
Subjects: Computation and Language (cs.CL)

Recent progress in pretrained Transformer-based language models has shown great success in learning contextual representation of text. However, due to the quadratic self-attention complexity, most of the pretrained Transformers models can only handle relatively short text. It is still a challenge when it comes to modeling very long documents. In this work, we propose to use a graph attention network on top of the available pretrained Transformers model to learn document embeddings. This graph attention network allows us to leverage the high-level semantic structure of the document. In addition, based on our graph document model, we design a simple contrastive learning strategy to pretrain our models on a large amount of unlabeled corpus. Empirically, we demonstrate the effectiveness of our approaches in document classification and document retrieval tasks.

[210]  arXiv:2110.10780 [pdf]
Title: An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, Interpretability and usability. Built upon our previous work, in this study, we proposed an open natural language processing development framework and evaluated it through the implementation of NLP algorithms for the National COVID Cohort Collaborative (N3C). Based on the interests in information extraction from COVID-19 related clinical notes, our work includes 1) an open data annotation process using COVID-19 signs and symptoms as the use case, 2) a community-driven ruleset composing platform, and 3) a synthetic text data generation workflow to generate texts for information extraction tasks without involving human subjects. The generated corpora derived out of the texts from multiple intuitions and gold standard annotation are tested on a single institution's rule set has the performances in F1 score of 0.876, 0.706 and 0.694, respectively. The study as a consortium effort of the N3C NLP subgroup demonstrates the feasibility of creating a federated NLP algorithm development and benchmarking platform to enhance multi-institution clinical NLP study.

[211]  arXiv:2110.10784 [pdf, other]
Title: Style Agnostic 3D Reconstruction via Adversarial Style Transfer
Comments: To be published at WACV 2022, Code @ this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Reconstructing the 3D geometry of an object from an image is a major challenge in computer vision. Recently introduced differentiable renderers can be leveraged to learn the 3D geometry of objects from 2D images, but those approaches require additional supervision to enable the renderer to produce an output that can be compared to the input image. This can be scene information or constraints such as object silhouettes, uniform backgrounds, material, texture, and lighting. In this paper, we propose an approach that enables a differentiable rendering-based learning of 3D objects from images with backgrounds without the need for silhouette supervision. Instead of trying to render an image close to the input, we propose an adversarial style-transfer and domain adaptation pipeline that allows to translate the input image domain to the rendered image domain. This allows us to directly compare between a translated image and the differentiable rendering of a 3D object reconstruction in order to train the 3D object reconstruction network. We show that the approach learns 3D geometry from images with backgrounds and provides a better performance than constrained methods for single-view 3D object reconstruction on this task.

[212]  arXiv:2110.10790 [pdf, other]
Title: Human-Centered Explainable AI (XAI): From Algorithms to User Experiences
Comments: draft for a book chapter
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

As a technical sub-field of artificial intelligence (AI), explainable AI (XAI) has produced a vast collection of algorithms in recent years. However, explainability is an inherently human-centric property and the field is starting to embrace inter-disciplinary perspectives and human-centered approaches. As researchers and practitioners begin to leverage XAI algorithms to build XAI applications, explainability has moved beyond a demand by data scientists or researchers to comprehend the models they are developing, to become an essential requirement for people to trust and adopt AI deployed in numerous domains. Human-computer interaction (HCI) research and user experience (UX) design in this area are therefore increasingly important. In this chapter, we begin with a high-level overview of the technical landscape of XAI algorithms, then selectively survey recent HCI work that takes human-centered approaches to design, evaluate, provide conceptual and methodological tools for XAI. We ask the question "what are human-centered approaches doing for XAI" and highlight three roles that they should play in shaping XAI technologies: to drive technical choices by understanding users' explainability needs, to uncover pitfalls of existing XAI methods through empirical studies and inform new methods, and to provide conceptual frameworks for human-compatible XAI.

[213]  arXiv:2110.10791 [pdf, other]
Title: Modeling Human-Human Collaboration: A Connection Between Inter-Personal Motor Synergy and Consensus Algorithms
Subjects: Systems and Control (eess.SY)

Many day-to-day activities involve people working collaboratively toward reaching a desired outcome. Previous research in motor control and neuroscience have proposed inter-personal motor synergy (IPMS) as a mechanism of collaboration between people, referring to the idea of how two or more people may work together "as if they were one" to coordinate their motion. In motor control literature, uncontrolled manifold (UCM) is used for quantifying IPMS. According to this approach, coordinated motion is achieved through stabilization of a performance variable (e.g., an output in a collaborative output tracking task). We show that the UCM approach is closely related to the well-studied consensus approach in multi-agent systems that concerns processes by which a set of interacting agents agree on a shared objective. To explore the connection between these two approaches, in this paper, we provide a control-theoretic model that represents the systems-level behaviors in a collaborative task. In particular, we utilize the consensus protocol and show how the model can be systematically tuned to reproduce the behavior exhibited by human-human collaboration experiments. We discuss the association between the proposed control law and the UCM approach and validate our model using experimental results previously collected from an inter-personal finger force production task.

[214]  arXiv:2110.10797 [pdf, other]
Title: Scheduling of Graph Queries: Controlling Intra- and Inter-query Parallelism for a High System Throughput
Authors: Matthias Hauck (1 and 2), Ismail Oukid (3), Holger Fröning (1) ((1) Heidelberg University, (2) SAP SE, (3) Snowflake Inc)
Subjects: Databases (cs.DB)

The vast amounts of data used in social, business or traffic networks, biology and other natural sciences are often managed in graph-based data sets, consisting of a few thousand up to billions and trillions of vertices and edges, respectively. Typical applications utilizing such data either execute one or a few complex queries or many small queries at the same time interactively or as batch jobs. Furthermore, graph processing is inherently complex, as data sets can substantially differ (scale free vs. constant degree), and algorithms exhibit diverse behavior (computational intensity, local or global, push- or pull-based).
This work is concerned with multi-query execution by automatically controlling the degree of parallelization, with overall objectives including high system utilization, low synchronization cost, and highly efficient concurrent execution. The underlying concept is three-fold: (1) sampling is used to determine graph statistics, (2) parallelization constraints are derived from algorithm and system properties, and (3) suitable work packages are generated based on the previous two aspects. We evaluate the proposed concept using different algorithms on synthetic and real world data sets, with up to 16 concurrent sessions (queries). The results demonstrate a robust performance in spite of these various configurations, and in particular that the performance is always close to or even slightly ahead of the performance of manually optimized implementations. Furthermore, the similar performance to manually optimized implementations under extreme configurations, which require either a full parallelization (few large queries) or complete sequential execution (many small queries), shows that the proposed concept exhibits a particularly low overhead.

[215]  arXiv:2110.10802 [pdf, other]
Title: A Data-Centric Optimization Framework for Machine Learning
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute. However, as frameworks specialize optimization to patterns in popular networks, they implicitly constrain novel and diverse models that drive progress in research. We empower deep learning researchers by defining a flexible and user-customizable pipeline for optimizing training of arbitrary deep neural networks, based on data movement minimization. The pipeline begins with standard networks in PyTorch or ONNX and transforms computation through progressive lowering. We define four levels of general-purpose transformations, from local intra-operator optimizations to global data movement reduction. These operate on a data-centric graph intermediate representation that expresses computation and data movement at all levels of abstraction, including expanding basic operators such as convolutions to their underlying computations. Central to the design is the interactive and introspectable nature of the pipeline. Every part is extensible through a Python API, and can be tuned interactively using a GUI. We demonstrate competitive performance or speedups on ten different networks, with interactive optimizations discovering new opportunities in EfficientNet.

[216]  arXiv:2110.10803 [pdf, other]
Title: Propensity-scored Probabilistic Label Trees
Comments: The extended version of SIGIR '21 Short Research Paper
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)

Extreme multi-label classification (XMLC) refers to the task of tagging instances with small subsets of relevant labels coming from an extremely large set of all possible labels. Recently, XMLC has been widely applied to diverse web applications such as automatic content labeling, online advertising, or recommendation systems. In such environments, label distribution is often highly imbalanced, consisting mostly of very rare tail labels, and relevant labels can be missing. As a remedy to these problems, the propensity model has been introduced and applied within several XMLC algorithms. In this work, we focus on the problem of optimal predictions under this model for probabilistic label trees, a popular approach for XMLC problems. We introduce an inference procedure, based on the $A^*$-search algorithm, that efficiently finds the optimal solution, assuming that all probabilities and propensities are known. We demonstrate the attractiveness of this approach in a wide empirical study on popular XMLC benchmark datasets.

[217]  arXiv:2110.10805 [pdf]
Title: DVIO: Depth aided visual inertial odometry for RGBD sensors
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

In past few years we have observed an increase in the usage of RGBD sensors in mobile devices. These sensors provide a good estimate of the depth map for the camera frame, which can be used in numerous augmented reality applications. This paper presents a new visual inertial odometry (VIO) system, which uses measurements from a RGBD sensor and an inertial measurement unit (IMU) sensor for estimating the motion state of the mobile device. The resulting system is called the depth-aided VIO (DVIO) system. In this system we add the depth measurement as part of the nonlinear optimization process. Specifically, we propose methods to use the depth measurement using one-dimensional (1D) feature parameterization as well as three-dimensional (3D) feature parameterization. In addition, we propose to utilize the depth measurement for estimating time offset between the unsynchronized IMU and the RGBD sensors. Last but not least, we propose a novel block-based marginalization approach to speed up the marginalization processes and maintain the real-time performance of the overall system. Experimental results validate that the proposed DVIO system outperforms the other state-of-the-art VIO systems in terms of trajectory accuracy as well as processing time.

[218]  arXiv:2110.10807 [pdf, other]
Title: Text-Based Person Search with Limited Data
Comments: 20 pages, 7 figures, 6 tables, to appear in BMVC2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query. Solving such a fine-grained cross-modal retrieval task is challenging, which is further hampered by the lack of large-scale datasets. In this paper, we present a framework with two novel components to handle the problems brought by limited data. Firstly, to fully utilize the existing small-scale benchmarking datasets for more discriminative feature learning, we introduce a cross-modal momentum contrastive learning framework to enrich the training data for a given mini-batch. Secondly, we propose to transfer knowledge learned from existing coarse-grained large-scale datasets containing image-text pairs from drastically different problem domains to compensate for the lack of TBPS training data. A transfer learning method is designed so that useful information can be transferred despite the large domain gap. Armed with these components, our method achieves new state of the art on the CUHK-PEDES dataset with significant improvements over the prior art in terms of Rank-1 and mAP. Our code is available at https://github.com/BrandonHanx/TextReID.

[219]  arXiv:2110.10809 [pdf, other]
Title: Hierarchical Skills for Efficient Exploration
Comments: To appear in 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. However, prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. In previous work on continuous control, the sensitivity of methods to this trade-off has not been addressed explicitly, as locomotion provides a suitable prior for navigation tasks, which have been of foremost interest. In this work, we analyze this trade-off for low-level policy pre-training with a new benchmark suite of diverse, sparse-reward tasks for bipedal robots. We alleviate the need for prior knowledge by proposing a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner. For utilization on downstream tasks, we present a three-layered hierarchical learning algorithm to automatically trade off between general and specific skills as required by the respective task. In our experiments, we show that our approach performs this trade-off effectively and achieves better results than current state-of-the-art methods for end- to-end hierarchical reinforcement learning and unsupervised skill discovery. Code and videos are available at https://facebookresearch.github.io/hsd3 .

[220]  arXiv:2110.10811 [pdf, ps, other]
Title: HALP: Hardware-Aware Latency Pruning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop. Both metrics can be evaluated very efficiently during pruning, allowing us to reformulate global structural pruning under a reward maximization problem given target constraint. This makes the problem solvable via our augmented knapsack solver, enabling HALP to surpass prior work in pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets. In particular, for ResNet-50/-101 pruning on ImageNet, HALP improves network throughput by $1.60\times$/$1.90\times$ with $+0.3\%$/$-0.2\%$ top-1 accuracy changes, respectively. For SSD pruning on VOC, HALP improves throughput by $1.94\times$ with only a $0.56$ mAP drop. HALP consistently outperforms prior art, sometimes by large margins.

[221]  arXiv:2110.10815 [pdf, other]
Title: Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks
Comments: 10 pages (Main) + 19 pages (Appendix), 6 figures
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient alternative to backpropagation for training neural networks. We provide convergence guarantees with rates for deep linear networks for both continuous and discrete dynamics. Additionally, we study incremental learning phenomena for shallow linear networks. Interestingly, certain specific initializations imply that negligible components are learned before the principal ones, thus potentially negatively affecting the effectiveness of such a learning algorithm; a phenomenon we classify as implicit anti-regularization. We also provide initialization schemes where the components of the problem are approximately learned by decreasing order of importance, thus providing a form of implicit regularization.

[222]  arXiv:2110.10819 [pdf, other]
Title: Shaking the foundations: delusions in sequence models for interaction and control
Comments: DeepMind Tech Report, 16 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of their actions" leading them to draw incorrect inferences due to auto-suggestive delusions. In this report we explain where this mismatch originates, and show that it can be resolved by treating actions as causal interventions. Finally, we show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively.

[223]  arXiv:2110.10824 [pdf, other]
Title: Dynamic Bipartite Matching Market with Arrivals and Departures
Comments: An extended abstract is to appear in ACM WINE 2021
Subjects: Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)

In this paper, we study a matching market model on a bipartite network where agents on each side arrive and depart stochastically by a Poisson process. For such a dynamic model, we design a mechanism that decides not only which agents to match, but also when to match them, to minimize the expected number of unmatched agents. The main contribution of this paper is to achieve theoretical bounds on the performance of local mechanisms with different timing properties. We show that an algorithm that waits to thicken the market, called the $\textit{Patient}$ algorithm, is exponentially better than the $\textit{Greedy}$ algorithm, i.e., an algorithm that matches agents greedily. This means that waiting has substantial benefits on maximizing a matching over a bipartite network. We remark that the Patient algorithm requires the planner to identify agents who are about to leave the market, and, under the requirement, the Patient algorithm is shown to be an optimal algorithm. We also show that, without the requirement, the Greedy algorithm is almost optimal. In addition, we consider the $\textit{1-sided algorithms}$ where only an agent on one side can attempt to match. This models a practical matching market such as a freight exchange market and a labor market where only agents on one side can make a decision. For this setting, we prove that the Greedy and Patient algorithms admit the same performance, that is, waiting to thicken the market is not valuable. This conclusion is in contrast to the case where agents on both sides can make a decision and the non-bipartite case by [Akbarpour et al.,$~\textit{Journal of Political Economy}$, 2020].

[224]  arXiv:2110.10827 [pdf, ps, other]
Title: How to pose material design problems for flow through porous media applications?: Sensitivity of dissipation rate to medium's permeability holds the key
Subjects: Numerical Analysis (math.NA)

Recent studies have advocated using the total dissipation rate under topology optimization to realize material designs involving the flow of fluids through porous media. However, these studies decided how to pose the design problem, such as maximizing the total dissipation rate for some situations while minimizing for others, by solving one-dimensional problems and justifying their choices using numerical experiments. The rigor is lacking -- a bottleneck for further scientific advancements to computational material design. This paper provides the missing theoretical justification. We identify four classes of boundary value problems using the adjoint state method and analytically calculate the sensitivity of the total dissipation rate to the permeability field. For two of those classes in which the flow of fluids is pressure-driven, the sensitivity is positive -- the total dissipation rate increases if the medium's permeability increases. While for the other two classes, in which the flow is velocity-driven, the trend is the opposite. These sensitivities provide rigorous answers to the central question: how to pose a material design problem for flow through porous media applications. The impact of our work is multi-fold. First, this study further elevates the role of the dissipation rate in posing well-posed material design problems using topology optimization. Second, besides the theoretical significance, the results benefit computational scientists and practitioners to realize optimal designs. Third, given their simplicity yet far-reaching impact, both the approach and results possess immense pedagogical value.

[225]  arXiv:2110.10828 [pdf, other]
Title: AdamD: Improved bias-correction in Adam
Authors: John St John
Comments: 6 pages, 1 figure
Subjects: Machine Learning (cs.LG)

Here I present a small update to the bias correction term in the Adam optimizer that has the advantage of behaving well in the first several steps. The default implementation of Adam may be as sensitive as it is to hyperparameters partially due to the originally proposed bias correction procedure, and its behavior in early steps of training.

[226]  arXiv:2110.10829 [pdf, other]
Title: ReachBot: A Small Robot for Large Mobile Manipulation Tasks
Comments: 12 pages, 13 figures
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Robots are widely deployed in space environments because of their versatility and robustness. However, adverse gravity conditions and challenging terrain geometry expose the limitations of traditional robot designs, which are often forced to sacrifice one of mobility or manipulation capabilities to attain the other. Prospective climbing operations in these environments reveals a need for small, compact robots capable of versatile mobility and manipulation. We propose a novel robotic concept called ReachBot that fills this need by combining two existing technologies: extendable booms and mobile manipulation. ReachBot leverages the reach and tensile strength of extendable booms to achieve an outsized reachable workspace and wrench capability. Through their lightweight, compactable structure, these booms also reduce mass and complexity compared to traditional rigid-link articulated-arm designs. Using these advantages, ReachBot excels in mobile manipulation missions in low gravity or that require climbing, particularly when anchor points are sparse. After introducing the ReachBot concept, we discuss modeling approaches and strategies for increasing stability and robustness. We then develop a 2D analytical model for ReachBot's dynamics inspired by grasp models for dexterous manipulators. Next, we introduce a waypoint-tracking controller for a planar ReachBot in microgravity. Our simulation results demonstrate the controller's robustness to disturbances and modeling error. Finally, we briefly discuss next steps that build on these initially promising results to realize the full potential of ReachBot.

[227]  arXiv:2110.10832 [pdf, other]
Title: Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In Domain Generalization (DG) settings, models trained on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that a simple protocol for averaging model parameters along the optimization path, starting early during training, both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable model selection. Next, we show that an ensemble of independently trained models also has a chaotic behavior in the DG setting. Taking advantage of our observation, we show that instead of ensembling unaveraged models, ensembling moving average models (EoA) from different runs does increase stability and further boosts performance. On the DomainBed benchmark, when using a ResNet-50 pre-trained on ImageNet, this ensemble of averages achieves $88.6\%$ on PACS, $79.1\%$ on VLCS, $72.5\%$ on OfficeHome, $52.3\%$ on TerraIncognita, and $47.4\%$ on DomainNet, an average of $68.0\%$, beating ERM (w/o model averaging) by $\sim 4\%$. We also evaluate a model that is pre-trained on a larger dataset, where we show EoA achieves an average accuracy of $72.7\%$, beating its corresponding ERM baseline by $5\%$.

[228]  arXiv:2110.10833 [pdf]
Title: High-resolution rainfall-runoff modeling using graph neural network
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Time-series modeling has shown great promise in recent studies using the latest deep learning algorithms such as LSTM (Long Short-Term Memory). These studies primarily focused on watershed-scale rainfall-runoff modeling or streamflow forecasting, but the majority of them only considered a single watershed as a unit. Although this simplification is very effective, it does not take into account spatial information, which could result in significant errors in large watersheds. Several studies investigated the use of GNN (Graph Neural Networks) for data integration by decomposing a large watershed into multiple sub-watersheds, but each sub-watershed is still treated as a whole, and the geoinformation contained within the watershed is not fully utilized. In this paper, we propose the GNRRM (Graph Neural Rainfall-Runoff Model), a novel deep learning model that makes full use of spatial information from high-resolution precipitation data, including flow direction and geographic information. When compared to baseline models, GNRRM has less over-fitting and significantly improves model performance. Our findings support the importance of hydrological data in deep learning-based rainfall-runoff modeling, and we encourage researchers to include more domain knowledge in their models.

[229]  arXiv:2110.10834 [pdf, other]
Title: Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization
Comments: EMNLP 2021 (16 pages)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

While much research has been done in text-to-image synthesis, little work has been done to explore the usage of linguistic structure of the input text. Such information is even more important for story visualization since its inputs have an explicit narrative structure that needs to be translated into an image sequence (or visual story). Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance. In this paper, we first explore the use of constituency parse trees using a Transformer-based recurrent architecture for encoding structured input. Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story. Third, we also incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images within a dual learning setup. We show that off-the-shelf dense-captioning models trained on Visual Genome can improve the spatial structure of images from a different target domain without needing fine-tuning. We train the model end-to-end using intra-story contrastive loss (between words and image sub-regions) and show significant improvements in several metrics (and human evaluation) for multiple datasets. Finally, we provide an analysis of the linguistic and visuo-spatial information. Code and data: https://github.com/adymaharana/VLCStoryGan.

[230]  arXiv:2110.10836 [pdf]
Title: Application of E-Commerce Technologies in accelerating the Success of SME Operation
Comments: Submitted to ICICT Conf 2022
Subjects: Computers and Society (cs.CY)

Electronic commerce (e-Commerce) technologies have been increased over the past two decades in different business sectors. In particular, the technologies of B2C operations have significantly improved the productivity of online small businesses such as SMEs. Systematic literature review in this domain categorized different benefits but a limited number of studies on SME success from a view of an information systems (IS) research exists, which needs to be taken for further attention. Informing through a comprehensive analysis this study introduces a conceptual framework of the application of e-Commerce technologies in accelerating the SME operation. Content analysis methodology was adopted for generating the outcome associated with the success of the technologies in SMEs.

[231]  arXiv:2110.10837 [pdf, other]
Title: A Domain Gap Aware Generative Adversarial Network for Multi-domain Image Translation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent image-to-image translation models have shown great success in mapping local textures between two domains. Existing approaches rely on a cycle-consistency constraint that supervises the generators to learn an inverse mapping. However, learning the inverse mapping introduces extra trainable parameters and it is unable to learn the inverse mapping for some domains. As a result, they are ineffective in the scenarios where (i) multiple visual image domains are involved; (ii) both structure and texture transformations are required; and (iii) semantic consistency is preserved. To solve these challenges, the paper proposes a unified model to translate images across multiple domains with significant domain gaps. Unlike previous models that constrain the generators with the ubiquitous cycle-consistency constraint to achieve the content similarity, the proposed model employs a perceptual self-regularization constraint. With a single unified generator, the model can maintain consistency over the global shapes as well as the local texture information across multiple domains. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and superior performance over state-of-the-art models. It is more effective in representing shape deformation in challenging mappings with significant dataset variation across multiple domains.

[232]  arXiv:2110.10842 [pdf, other]
Title: SMOF: Squeezing More Out of Filters Yields Hardware-Friendly CNN Pruning
Comments: 11 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

For many years, the family of convolutional neural networks (CNNs) has been a workhorse in deep learning. Recently, many novel CNN structures have been designed to address increasingly challenging tasks. To make them work efficiently on edge devices, researchers have proposed various structured network pruning strategies to reduce their memory and computational cost. However, most of them only focus on reducing the number of filter channels per layer without considering the redundancy within individual filter channels. In this work, we explore pruning from another dimension, the kernel size. We develop a CNN pruning framework called SMOF, which Squeezes More Out of Filters by reducing both kernel size and the number of filter channels. Notably, SMOF is friendly to standard hardware devices without any customized low-level implementations, and the pruning effort by kernel size reduction does not suffer from the fixed-size width constraint in SIMD units of general-purpose processors. The pruned networks can be deployed effortlessly with significant running time reduction. We also support these claims via extensive experiments on various CNN structures and general-purpose processors for mobile devices.

[233]  arXiv:2110.10849 [pdf, other]
Title: Using NASA Satellite Data Sources and Geometric Deep Learning to Uncover Hidden Patterns in COVID-19 Clinical Severity
Comments: Main Paper and Appendix
Subjects: Machine Learning (cs.LG); Applications (stat.AP)

As multiple adverse events in 2021 illustrated, virtually all aspects of our societal functioning -- from water and food security to energy supply to healthcare -- more than ever depend on the dynamics of environmental factors. Nevertheless, the social dimensions of weather and climate are noticeably less explored by the machine learning community, largely, due to the lack of reliable and easy access to use data. Here we present a unique not yet broadly available NASA's satellite dataset on aerosol optical depth (AOD), temperature and relative humidity and discuss the utility of these new data for COVID-19 biosurveillance. In particular, using the geometric deep learning models for semi-supervised classification on a county-level basis over the contiguous United States, we investigate the pressing societal question whether atmospheric variables have considerable impact on COVID-19 clinical severity.

[234]  arXiv:2110.10850 [pdf, other]
Title: Locality-Sensitive Experience Replay for Online Recommendation
Subjects: Information Retrieval (cs.IR)

Online recommendation requires handling rapidly changing user preferences. Deep reinforcement learning (DRL) is gaining interest as an effective means of capturing users' dynamic interest during interactions with recommender systems. However, it is challenging to train a DRL agent, due to large state space (e.g., user-item rating matrix and user profiles), action space (e.g., candidate items), and sparse rewards. Existing studies encourage the agent to learn from past experience via experience replay (ER). They adapt poorly to the complex environment of online recommender systems and are inefficient in determining an optimal strategy from past experience. To address these issues, we design a novel state-aware experience replay model, which uses locality-sensitive hashing to map high dimensional data into low-dimensional representations and a prioritized reward-driven strategy to replay more valuable experience at a higher chance. Our model can selectively pick the most relevant and salient experiences and recommend the agent with the optimal policy. Experiments on three online simulation platforms demonstrate our model' feasibility and superiority toseveral existing experience replay methods.

[235]  arXiv:2110.10854 [pdf, ps, other]
Title: Performance Analysis for Covert Communications Under Faster-than-Nyquist Signaling
Comments: 5 pages, 4 figures. The paper has been submitted to IEEE Communications Letters on 20-Oct-2021
Subjects: Information Theory (cs.IT)

In this letter, we analyze the performance of covert communications under faster-than-Nyquist (FTN) signaling in an additive white Gaussian noise channel. Both Neyman-Pearson criterion- and Kullback-Leibler (KL) divergence-based covertness constraints are considered. Especially, for KL divergence-based one, we prove that both the maximum transmit power and covert rate under FTN signaling are higher than those under Nyquist signaling. Numerical results coincide with our analysis and validate the advantages of FTN signaling to realize covert data transmission.

[236]  arXiv:2110.10857 [pdf, other]
Title: Vortex: Extending the RISC-V ISA for GPGPU and 3D-GraphicsResearch
Journal-ref: MICRO'21: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '21), October 18--22, 2021, Virtual Event, Greece
Subjects: Hardware Architecture (cs.AR)

The importance of open-source hardware and software has been increasing. However, despite GPUs being one of the more popular accelerators across various applications, there is very little open-source GPU infrastructure in the public domain. We argue that one of the reasons for the lack of open-source infrastructure for GPUs is rooted in the complexity of their ISA and software stacks.In this work, we first propose an ISA extension to RISC-V that supports GPGPUs and graphics. The main goal of the ISA extension proposal is to minimize the ISA changes so that the corresponding changes to the open-source ecosystem are also minimal, which makes for a sustainable development ecosystem. To demonstrate the feasibility of the minimally extended RISC-V ISA, we implemented the complete software and hardware stacks of Vortex on FPGA. Vortex is a PCIe-based soft GPU that supports OpenCL and OpenGL.Vortex can be used in a variety of applications, including machine learning, graph analytics, and graphics rendering. Vortex can scale up to 32 cores on an Altera Stratix 10 FPGA, delivering a peak performance of 25.6 GFlops at 200 Mhz.

[237]  arXiv:2110.10858 [pdf, other]
Title: Utilizing Redundancy in Cost Functions for Resilience in Distributed Optimization and Learning
Comments: 66 pages, 1 figure, and 1 table. Supersede our previous report arXiv:2106.03998 in asynchronous distributed optimization by containing the most of its results
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

This paper considers the problem of resilient distributed optimization and stochastic machine learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has a local cost function. The agents collaborate with the server to find a minimum of their aggregate cost functions. We consider the case when some of the agents may be asynchronous and/or Byzantine faulty. In this case, the classical algorithm of distributed gradient descent (DGD) is rendered ineffective. Our goal is to design techniques improving the efficacy of DGD with asynchrony and Byzantine failures. To do so, we start by proposing a way to model the agents' cost functions by the generic notion of $(f, \,r; \epsilon)$-redundancy where $f$ and $r$ are the parameters of Byzantine failures and asynchrony, respectively, and $\epsilon$ characterizes the closeness between agents' cost functions. This allows us to quantify the level of redundancy present amongst the agents' cost functions, for any given distributed optimization problem. We demonstrate, both theoretically and empirically, the merits of our proposed redundancy model in improving the robustness of DGD against asynchronous and Byzantine agents, and their extensions to distributed stochastic gradient descent (D-SGD) for robust distributed machine learning with asynchronous and Byzantine agents.

[238]  arXiv:2110.10863 [pdf, other]
Title: Deep Generative Models in Engineering Design: A Review
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Automated design synthesis has the potential to revolutionize the modern human design process and improve access to highly optimized and customized products across countless industries. Successfully adapting generative Machine Learning to design engineering may be the key to such automated design synthesis and is a research subject of great importance. We present a review and analysis of Deep Generative Learning models in engineering design. Deep Generative Models (DGMs) typically leverage deep networks to learn from an input dataset and learn to synthesize new designs. Recently, DGMs such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), feedforward Neural Networks (NNs) and certain Deep Reinforcement Learning (DRL) frameworks have shown promising results in design applications like structural optimization, materials design, and shape synthesis. The prevalence of DGMs in Engineering Design has skyrocketed since 2016. Anticipating continued growth, we conduct a review of recent advances with the hope of benefitting researchers interested in DGMs for design. We structure our review as an exposition of the algorithms, datasets, representation methods, and applications commonly used in the current literature. In particular, we discuss key works that have introduced new techniques and methods in DGMs, successfully applied DGMs to a design-related domain, or directly supported development of DGMs through datasets or auxiliary methods. We further identify key challenges and limitations currently seen in DGMs across design fields, such as design creativity, handling complex constraints and objectives, and modeling both form and functional performance simultaneously. In our discussion we identify possible solution pathways as key areas on which to target future work.

[239]  arXiv:2110.10864 [pdf, other]
Title: Class-Discriminative CNN Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Compressing convolutional neural networks (CNNs) by pruning and distillation has received ever-increasing focus in the community. In particular, designing a class-discrimination based approach would be desired as it fits seamlessly with the CNNs training objective. In this paper, we propose class-discriminative compression (CDC), which injects class discrimination in both pruning and distillation to facilitate the CNNs training goal. We first study the effectiveness of a group of discriminant functions for channel pruning, where we include well-known single-variate binary-class statistics like Student's T-Test in our study via an intuitive generalization. We then propose a novel layer-adaptive hierarchical pruning approach, where we use a coarse class discrimination scheme for early layers and a fine one for later layers. This method naturally accords with the fact that CNNs process coarse semantics in the early layers and extract fine concepts at the later. Moreover, we leverage discriminant component analysis (DCA) to distill knowledge of intermediate representations in a subspace with rich discriminative information, which enhances hidden layers' linear separability and classification accuracy of the student. Combining pruning and distillation, CDC is evaluated on CIFAR and ILSVRC 2012, where we consistently outperform the state-of-the-art results.

[240]  arXiv:2110.10869 [pdf, other]
Title: LC3Net: Ladder context correlation complementary network for salient object detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Currently, existing salient object detection methods based on convolutional neural networks commonly resort to constructing discriminative networks to aggregate high level and low level features. However, contextual information is always not fully and reasonably utilized, which usually causes either the absence of useful features or contamination of redundant features. To address these issues, we propose a novel ladder context correlation complementary network (LC3Net) in this paper, which is equipped with three crucial components. At the beginning, we propose a filterable convolution block (FCB) to assist the automatic collection of information on the diversity of initial features, and it is simple yet practical. Besides, we propose a dense cross module (DCM) to facilitate the intimate aggregation of different levels of features by validly integrating semantic information and detailed information of both adjacent and non-adjacent layers. Furthermore, we propose a bidirectional compression decoder (BCD) to help the progressive shrinkage of multi-scale features from coarse to fine by leveraging multiple pairs of alternating top-down and bottom-up feature interaction flows. Extensive experiments demonstrate the superiority of our method against 16 state-of-the-art methods.

[241]  arXiv:2110.10871 [pdf, other]
Title: Principled Representation Learning for Entity Alignment
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Embedding-based entity alignment (EEA) has recently received great attention. Despite significant performance improvement, few efforts have been paid to facilitate understanding of EEA methods. Most existing studies rest on the assumption that a small number of pre-aligned entities can serve as anchors connecting the embedding spaces of two KGs. Nevertheless, no one investigates the rationality of such an assumption. To fill the research gap, we define a typical paradigm abstracted from existing EEA methods and analyze how the embedding discrepancy between two potentially aligned entities is implicitly bounded by a predefined margin in the scoring function. Further, we find that such a bound cannot guarantee to be tight enough for alignment learning. We mitigate this problem by proposing a new approach, named NeoEA, to explicitly learn KG-invariant and principled entity embeddings. In this sense, an EEA model not only pursues the closeness of aligned entities based on geometric distance, but also aligns the neural ontologies of two KGs by eliminating the discrepancy in embedding distribution and underlying ontology knowledge. Our experiments demonstrate consistent and significant improvement in performance against the best-performing EEA methods.

[242]  arXiv:2110.10872 [pdf, other]
Title: HENet: Forcing a Network to Think More for Font Recognition
Comments: 8 pages, 2021 3rd International Conference on Advanced Information Science and System (AISS 2021)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Although lots of progress were made in Text Recognition/OCR in recent years, the task of font recognition is remaining challenging. The main challenge lies in the subtle difference between these similar fonts, which is hard to distinguish. This paper proposes a novel font recognizer with a pluggable module solving the font recognition task. The pluggable module hides the most discriminative accessible features and forces the network to consider other complicated features to solve the hard examples of similar fonts, called HE Block. Compared with the available public font recognition systems, our proposed method does not require any interactions at the inference stage. Extensive experiments demonstrate that HENet achieves encouraging performance, including on character-level dataset Explor_all and word-level dataset AdobeVFR

[243]  arXiv:2110.10873 [pdf, other]
Title: Controllable and Compositional Generation with Latent-Space Energy-Based Models
Comments: 32 pages, NeurIPS 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE). Given a pre-trained generator, all we need for controllable generation is to train an attribute classifier. Sampling with ODEs is done efficiently in the latent space and is robust to hyperparameters. Thus, our method is simple, fast to train, and efficient to sample. Experimental results show that our method outperforms the state-of-the-art in both conditional sampling and sequential editing. In compositional generation, our method excels at zero-shot generation of unseen attribute combinations. Also, by composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.

[244]  arXiv:2110.10874 [pdf, ps, other]
Title: CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level
Subjects: Computation and Language (cs.CL)

Automatic text summarization aims to produce a brief but crucial summary for the input documents. Both extractive and abstractive methods have witnessed great success in English datasets in recent years. However, there has been a minimal exploration of text summarization in Chinese, limited by the lack of large-scale datasets. In this paper, we present a large-scale Chinese news summarization dataset CNewSum, which consists of 304,307 documents and human-written summaries for the news feed. It has long documents with high-abstractive summaries, which can encourage document-level understanding and generation for current summarization models. An additional distinguishing feature of CNewSum is that its test set contains adequacy and deducibility annotations for the summaries. The adequacy level measures the degree of summary information covered by the document, and the deducibility indicates the reasoning ability the model needs to generate the summary. These annotations can help researchers analyze and target their model performance bottleneck. We examine recent methods on CNewSum and release our dataset to provide a solid testbed for automatic Chinese summarization research.

[245]  arXiv:2110.10876 [pdf, other]
Title: Evolving Transferable Pruning Functions
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Channel pruning has made major headway in the design of efficient deep learning models. Conventional approaches adopt human-made pruning functions to score channels' importance for channel pruning, which requires domain knowledge and could be sub-optimal. In this work, we propose an end-to-end framework to automatically discover strong pruning metrics. Specifically, we craft a novel design space for expressing pruning functions and leverage an evolution strategy, genetic programming, to evolve high-quality and transferable pruning functions. Unlike prior methods, our approach can not only provide compact pruned networks for efficient inference, but also novel closed-form pruning metrics that are mathematically explainable and thus generalizable to different pruning tasks. The evolution is conducted on small datasets while the learned functions are transferable to larger datasets without any manual modification. Compared to direct evolution on a large dataset, our strategy shows better cost-effectiveness. When applied to more challenging datasets, different from those used in the evolution process, e.g., ILSVRC-2012, an evolved function achieves state-of-the-art pruning results.

[246]  arXiv:2110.10877 [pdf, other]
Title: Decision Theoretic Cutoff and ROC Analysis for Bayesian Optimal Group Testing
Comments: 17 pages, 8 figures
Subjects: Information Theory (cs.IT); Disordered Systems and Neural Networks (cond-mat.dis-nn)

We study the inference problem in the group testing to identify defective items from the perspective of the decision theory. We introduce Bayesian inference and consider the Bayesian optimal setting in which the true generative process of the test results is known. We demonstrate the adequacy of the posterior marginal probability in the Bayesian optimal setting as a diagnostic variable based on the area under the curve (AUC). Using the posterior marginal probability, we derive the general expression of the optimal cutoff value that yields the minimum expected risk function. Furthermore, we evaluate the performance of the Bayesian group testing without knowing the true states of the items: defective or non-defective. By introducing an analytical method from statistical physics, we derive the receiver operating characteristics curve, and quantify the corresponding AUC under the Bayesian optimal setting. The obtained analytical results precisely describes the actual performance of the belief propagation algorithm defined for single samples when the number of items is sufficiently large.

[247]  arXiv:2110.10881 [pdf, other]
Title: Threshold Tests as Quality Signals: Optimal Strategies, Equilibria, and Price of Anarchy
Comments: 43 pages, 3 figures, to appear at WINE 2021
Subjects: Computer Science and Game Theory (cs.GT)

We study a signaling game between two firms competing to have their product chosen by a principal. The products have qualities drawn i.i.d. from a common prior. The principal aims to choose the better product, but the quality of a product can only be estimated via a coarse-grained threshold test: for chosen $\theta$, the principal learns whether a product's quality exceeds $\theta$ or not.
We study this problem under two types of interactions. In the first, the principal does the testing herself, and can choose tests from a class of allowable tests. We show that the optimum strategy for the principal is to administer different tests to the two products: one which is passed with probability $\frac{1}{3}$ and the other with probability $\frac{2}{3}$. If, however, the principal is required to choose the tests in a symmetric manner (i.e., via an i.i.d.~distribution), then the optimal strategy is to choose tests whose probability of passing is drawn uniformly from $[\frac{1}{4}, \frac{3}{4}]$.
In our second model, test difficulties are selected endogenously by the firms. This corresponds to a setting in which the firms must commit to their testing procedures before knowing the quality of their products. This interaction naturally gives rise to a signaling game; we characterize the unique Bayes-Nash Equilibrium of this game, which happens to be symmetric. We then calculate its Price of Anarchy in terms of the principal's probability of choosing the worse product. Finally, we show that by restricting both firms' set of available thresholds to choose from, the principal can lower the Price of Anarchy of the resulting equilibrium; however, there is a limit, in that for every (common) restricted set of tests, the equilibrium failure probability is strictly larger than under the optimal i.i.d. distribution.

[248]  arXiv:2110.10887 [pdf, other]
Title: A Real-Time Energy and Cost Efficient Vehicle Route Assignment Neural Recommender System
Comments: 14 pages, 11 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)

This paper presents a neural network recommender system algorithm for assigning vehicles to routes based on energy and cost criteria. In this work, we applied this new approach to efficiently identify the most cost-effective medium and heavy duty truck (MDHDT) powertrain technology, from a total cost of ownership (TCO) perspective, for given trips. We employ a machine learning based approach to efficiently estimate the energy consumption of various candidate vehicles over given routes, defined as sequences of links (road segments), with little information known about internal dynamics, i.e using high level macroscopic route information. A complete recommendation logic is then developed to allow for real-time optimum assignment for each route, subject to the operational constraints of the fleet. We show how this framework can be used to (1) efficiently provide a single trip recommendation with a top-$k$ vehicles star ranking system, and (2) engage in more general assignment problems where $n$ vehicles need to be deployed over $m \leq n$ trips. This new assignment system has been deployed and integrated into the POLARIS Transportation System Simulation Tool for use in research conducted by the Department of Energy's Systems and Modeling for Accelerated Research in Transportation (SMART) Mobility Consortium

[249]  arXiv:2110.10895 [pdf, other]
Title: Finite Volume Least-Squares Neural Network (FV-LSNN) Method for Scalar Nonlinear Hyperbolic Conservation Laws
Comments: arXiv admin note: text overlap with arXiv:2105.11627
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)

In [4], we introduced the least-squares ReLU neural network (LSNN) method for solving the linear advection-reaction problem with discontinuous solution and showed that the number of degrees of freedom for the LSNN method is significantly less than that of traditional mesh-based methods. The LSNN method is a discretization of an equivalent least-squares (LS) formulation in the class of neural network functions with the ReLU activation function; and evaluation of the LS functional is done by using numerical integration and proper numerical differentiation.
By developing a novel finite volume approximation (FVA) to the divergence operator, this paper studies the LSNN method for scalar nonlinear hyperbolic conservation laws. The FVA introduced in this paper is tailored to the LSNN method and is more accurate than traditional, well-studied FV schemes used in mesh-based numerical methods. Numerical results of some benchmark test problems with both convex and non-convex fluxes show that the finite volume LSNN (FV-LSNN) method is capable of computing the physical solution for problems with rarefaction waves and capturing the shock of the underlying problem automatically through the free hyper-planes of the ReLU neural network. Moreover, the method does not exhibit the common Gibbs phenomena along the discontinuous interface.

[250]  arXiv:2110.10897 [pdf, other]
Title: Privacy-Aware Identity Cloning Detection based on Deep Forest
Comments: The 19th International Conference on Service Oriented Computing (ICSOC 2021). arXiv admin note: text overlap with arXiv:2109.15179
Subjects: Social and Information Networks (cs.SI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

We propose a novel method to detect identity cloning of social-sensor cloud service providers to prevent the detrimental outcomes caused by identity deception. This approach leverages non-privacy-sensitive user profile data gathered from social networks and a powerful deep learning model to perform cloned identity detection. We evaluated the proposed method against the state-of-the-art identity cloning detection techniques and the other popular identity deception detection models atop a real-world dataset. The results show that our method significantly outperforms these techniques/models in terms of Precision and F1-score.

[251]  arXiv:2110.10898 [pdf, other]
Title: Deep Image Matting with Flexible Guidance Input
Comments: Accepted to BMVC2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Image matting is an important computer vision problem. Many existing matting methods require a hand-made trimap to provide auxiliary information, which is very expensive and limits the real world usage. Recently, some trimap-free methods have been proposed, which completely get rid of any user input. However, their performance lag far behind trimap-based methods due to the lack of guidance information. In this paper, we propose a matting method that use Flexible Guidance Input as user hint, which means our method can use trimap, scribblemap or clickmap as guidance information or even work without any guidance input. To achieve this, we propose Progressive Trimap Deformation(PTD) scheme that gradually shrink the area of the foreground and background of the trimap with the training step increases and finally become a scribblemap. To make our network robust to any user scribble and click, we randomly sample points on foreground and background and perform curve fitting. Moreover, we propose Semantic Fusion Module(SFM) which utilize the Feature Pyramid Enhancement Module(FPEM) and Joint Pyramid Upsampling(JPU) in matting task for the first time. The experiments show that our method can achieve state-of-the-art results comparing with existing trimap-based and trimap-free methods.

[252]  arXiv:2110.10899 [pdf, other]
Title: LARNet: Latent Action Representation for Human Action Synthesis
Comments: British Machine Vision Conference (BMVC) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present LARNet, a novel end-to-end approach for generating human action videos. A joint generative modeling of appearance and dynamics to synthesize a video is very challenging and therefore recent works in video synthesis have proposed to decompose these two factors. However, these methods require a driving video to model the video dynamics. In this work, we propose a generative approach instead, which explicitly learns action dynamics in latent space avoiding the need of a driving video during inference. The generated action dynamics is integrated with the appearance using a recurrent hierarchical structure which induces motion at different scales to focus on both coarse as well as fine level action details. In addition, we propose a novel mix-adversarial loss function which aims at improving the temporal coherency of synthesized videos. We evaluate the proposed approach on four real-world human action datasets demonstrating the effectiveness of the proposed approach in generating human actions. The code and models will be made publicly available.

[253]  arXiv:2110.10901 [pdf, ps, other]
Title: A Fast Location Algorithm for Very Sparse Point Clouds Based on Object Detection
Authors: Shiyu Fan
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Limited by the performance factor, it is arduous to recognize target object and locate it in Augmented Reality (AR) scenes on low-end mobile devices, especially which using monocular cameras. In this paper, we proposed an algorithm which can quickly locate the target object through image object detection in the circumstances of having very sparse feature points. We introduce YOLOv3-Tiny to our algorithm as the object detection module to filter the possible points and using Principal Component Analysis (PCA) to determine the location. We conduct the experiment in a manually designed scene by holding a smartphone and the results represent high positioning speed and accuracy of our method.

[254]  arXiv:2110.10905 [pdf, other]
Title: Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

End-to-end learning robotic manipulation with high data efficiency is one of the key challenges in robotics. The latest methods that utilize human demonstration data and unsupervised representation learning has proven to be a promising direction to improve RL learning efficiency. The use of demonstration data also allows "warming-up" the RL policies using offline data with imitation learning or the recently emerged offline reinforcement learning algorithms. However, existing works often treat offline policy learning and online exploration as two separate processes, which are often accompanied by severe performance drop during the offline-to-online transition. Furthermore, many robotic manipulation tasks involve complex sub-task structures, which are very challenging to be solved in RL with sparse reward. In this work, we propose a unified offline-to-online RL framework that resolves the transition performance drop issue. Additionally, we introduce goal-aware state information to the RL agent, which can greatly reduce task complexity and accelerate policy learning. Combined with an advanced unsupervised representation learning module, our framework achieves great training efficiency and performance compared with the state-of-the-art methods in multiple robotic manipulation tasks.

[255]  arXiv:2110.10906 [pdf, other]
Title: Single-Modal Entropy based Active Learning for Visual Question Answering
Comments: Accepted to BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Constructing a large-scale labeled dataset in the real world, especially for high-level tasks (eg, Visual Question Answering), can be expensive and time-consuming. In addition, with the ever-growing amounts of data and architecture complexity, Active Learning has become an important aspect of computer vision research. In this work, we address Active Learning in the multi-modal setting of Visual Question Answering (VQA). In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition through the use of ad hoc single-modal branches for each input to leverage its information. Our mutual information based sample acquisition strategy Single-Modal Entropic Measure (SMEM) in addition to our self-distillation technique enables the sample acquisitor to exploit all present modalities and find the most informative samples. Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks. We confirm our findings on various VQA datasets through state-of-the-art performance by comparing to existing Active Learning baselines.

[256]  arXiv:2110.10909 [pdf, ps, other]
Title: On the benefits of being constrained when receiving signals
Subjects: Computer Science and Game Theory (cs.GT)

We study a Bayesian persuasion setting in which the receiver is trying to match the (binary) state of the world. The sender's utility is partially aligned with the receiver's, in that conditioned on the receiver's action, the sender derives higher utility when the state of the world matches the action.
Our focus is on whether, in such a setting, being constrained helps a receiver. Intuitively, if the receiver can only take the sender's preferred action with a smaller probability, the sender might have to reveal more information, so that the receiver can take the action more specifically when the sender prefers it. We show that with a binary state of the world, this intuition indeed carries through: under very mild non-degeneracy conditions, a more constrained receiver will always obtain (weakly) higher utility than a less constrained one. Unfortunately, without additional assumptions, the result does not hold when there are more than two states in the world, which we show with an explicit example.

[257]  arXiv:2110.10913 [pdf, other]
Title: WENO interpolations and reconstructions using data bounded polynomial approximation
Comments: 21 pages, 7 figures, 2 Tables
Subjects: Numerical Analysis (math.NA)

This work characterizes the structure of third and forth order WENO weights by deducing data bounded condition on third order polynomial approximations. Using these conditions, non-linear weights are defined for third and fourth order data bounded weighted essentially non-oscillatory (WENO) approximations. Computational results show that data bounded WENO approximations for smooth functions achieve required accuracy and do not exhibit overshoot or undershoot for functions with discontinuities and extrema. Further with suitable weights, high order data-bounded WENO approximations are proposed for WENO schemes.

[258]  arXiv:2110.10914 [pdf, other]
Title: An Empirical Evaluation of Time-Series Feature Sets
Comments: Submitted to and accepted for publication in SFE-TSDM Workshop at 21st IEEE International Conference on Data Mining (IEEE ICDM 2021)
Subjects: Machine Learning (cs.LG)

Solving time-series problems with features has been rising in popularity due to the availability of software for feature extraction. Feature-based time-series analysis can now be performed using many different feature sets, including hctsa (7730 features: Matlab), feasts (42 features: R), tsfeatures (63 features: R), Kats (40 features: Python), tsfresh (up to 1558 features: Python), TSFEL (390 features: Python), and the C-coded catch22 (22 features: Matlab, R, Python, and Julia). There is substantial overlap in the types of methods included in these sets (e.g., properties of the autocorrelation function and Fourier power spectrum), but they are yet to be systematically compared. Here we compare these seven sets on computational speed, assess the redundancy of features contained in each, and evaluate the overlap and redundancy between them. We take an empirical approach to feature similarity based on outputs across a diverse set of real-world and simulated time series. We find that feature sets vary across three orders of magnitude in their computation time per feature on a laptop for a 1000-sample series, from the fastest sets catch22 and TSFEL (~0.1ms per feature) to tsfeatures (~3s per feature). Using PCA to evaluate feature redundancy within each set, we find the highest within-set redundancy for TSFEL and tsfresh. For example, in TSFEL, 90% of the variance across 390 features can be captured with just four PCs. Finally, we introduce a metric for quantifying overlap between pairs of feature sets, which indicates substantial overlap. We found that the largest feature set, hctsa, is the most comprehensive, and that tsfresh is the most distinctive, due to its incorporation of many low-level Fourier coefficients. Our results provide empirical understanding of the differences between existing feature sets, information that can be used to better tailor feature sets to their applications.

[259]  arXiv:2110.10915 [pdf, ps, other]
Title: On some theoretical limitations of Generative Adversarial Networks
Comments: 7 pages
Subjects: Machine Learning (cs.LG)

Generative Adversarial Networks have become a core technique in Machine Learning to generate unknown distributions from data samples. They have been used in a wide range of context without paying much attention to the possible theoretical limitations of those models. Indeed, because of the universal approximation properties of Neural Networks, it is a general assumption that GANs can generate any probability distribution. Recently, people began to question this assumption and this article is in line with this thinking. We provide a new result based on Extreme Value Theory showing that GANs can't generate heavy tailed distributions. The full proof of this result is given.

[260]  arXiv:2110.10916 [pdf, other]
Title: Exploiting Inter-pixel Correlations in Unsupervised Domain Adaptation for Semantic Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

"Self-training" has become a dominant method for semantic segmentation via unsupervised domain adaptation (UDA). It creates a set of pseudo labels for the target domain to give explicit supervision. However, the pseudo labels are noisy, sparse and do not provide any information about inter-pixel correlations. We regard inter-pixel correlation quite important because semantic segmentation is a task of predicting highly structured pixel-level outputs. Therefore, in this paper, we propose a method of transferring the inter-pixel correlations from the source domain to the target domain via a self-attention module. The module takes the prediction of the segmentation network as an input and creates a self-attended prediction that correlates similar pixels. The module is trained only on the source domain to learn the domain-invariant inter-pixel correlations, then later, it is used to train the segmentation network on the target domain. The network learns not only from the pseudo labels but also by following the output of the self-attention module which provides additional knowledge about the inter-pixel correlations. Through extensive experiments, we show that our method significantly improves the performance on two standard UDA benchmarks and also can be combined with recent state-of-the-art method to achieve better performance.

[261]  arXiv:2110.10919 [pdf, other]
Title: FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism
Comments: 17 pages, 16 figures
Subjects: Networking and Internet Architecture (cs.NI); Operating Systems (cs.OS)

FlexTOE is a flexible, yet high-performance TCP offload engine (TOE) to SmartNICs. FlexTOE eliminates almost all host data-path TCP processing and is fully customizable. FlexTOE interoperates well with other TCP stacks, is robust under adverse network conditions, and supports POSIX sockets.
FlexTOE focuses on data-path offload of established connections, avoiding complex control logic and packet buffering in the NIC. FlexTOE leverages fine-grained parallelization of the TCP data-path and segment reordering for high performance on wimpy SmartNIC architecture, while remaining flexible via a modular design. We compare FlexTOE to Linux, the TAS software TCP accelerator, and the Chelsio Terminator TOE. We find that Memcached scales up to 38% better on FlexTOE versus TAS, while saving up to 81% host CPU cycles versus Chelsio. FlexTOE provides competitive performance for RPCs, even with wimpy SmartNICs. FlexTOE cuts 99.99th-percentile RPC RTT by 3.2$\times$ and 50% versus Chelsio and TAS, respectively. FlexTOE's API supports Micro-C and XDP programs written in eBPF. It allows us to implement popular data center transport features, such as TCP tracing, packet filtering and capture, VLAN stripping, flow classification, firewalling, and connection splicing.

[262]  arXiv:2110.10921 [pdf, other]
Title: CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Deep convolutional neural networks are shown to be overkill with high parametric and computational redundancy in many application scenarios, and an increasing number of works have explored model pruning to obtain lightweight and efficient networks. However, most existing pruning approaches are driven by empirical heuristics and rarely consider the joint impact of channels, leading to unguaranteed and suboptimal performance. In this paper, we propose a novel channel pruning method via class-aware trace ratio optimization (CATRO) to reduce the computational burden and accelerate the model inference. Utilizing class information from a few samples, CATRO measures the joint impact of multiple channels by feature space discriminations and consolidates the layer-wise impact of preserved channels. By formulating channel pruning as a submodular set function maximization problem, CATRO solves it efficiently via a two-stage greedy iterative optimization procedure. More importantly, we present theoretical justifications on convergence and performance of CATRO. Experimental results demonstrate that CATRO achieves higher accuracy with similar computation cost or lower computation cost with similar accuracy than other state-of-the-art channel pruning algorithms. In addition, because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.

[263]  arXiv:2110.10924 [pdf]
Title: Fuzzy-Depth Objects Grasping Based on FSG Algorithm and a Soft Robotic Hand
Comments: accepted by IROS 2021
Subjects: Robotics (cs.RO)

Autonomous grasping is an important factor for robots physically interacting with the environment and executing versatile tasks. However, a universally applicable, cost-effective, and rapidly deployable autonomous grasping approach is still limited by those target objects with fuzzy-depth information. Examples are transparent, specular, flat, and small objects whose depth is difficult to be accurately sensed. In this work, we present a solution to those fuzzy-depth objects. The framework of our approach includes two major components: one is a soft robotic hand and the other one is a Fuzzy-depth Soft Grasping (FSG) algorithm. The soft hand is replaceable for most existing soft hands/grippers with body compliance. FSG algorithm exploits both RGB and depth images to predict grasps while not trying to reconstruct the whole scene. Two grasping primitives are designed to further increase robustness. The proposed method outperforms reference baselines in unseen fuzzy-depth objects grasping experiments (84% success rate).

[264]  arXiv:2110.10926 [pdf, other]
Title: PipAttack: Poisoning Federated Recommender Systems forManipulating Item Promotion
Comments: Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM '22)
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Due to the growing privacy concerns, decentralization emerges rapidly in personalized services, especially recommendation. Also, recent studies have shown that centralized models are vulnerable to poisoning attacks, compromising their integrity. In the context of recommender systems, a typical goal of such poisoning attacks is to promote the adversary's target items by interfering with the training dataset and/or process. Hence, a common practice is to subsume recommender systems under the decentralized federated learning paradigm, which enables all user devices to collaboratively learn a global recommender while retaining all the sensitive data locally. Without exposing the full knowledge of the recommender and entire dataset to end-users, such federated recommendation is widely regarded `safe' towards poisoning attacks. In this paper, we present a systematic approach to backdooring federated recommender systems for targeted item promotion. The core tactic is to take advantage of the inherent popularity bias that commonly exists in data-driven recommenders. As popular items are more likely to appear in the recommendation list, our innovatively designed attack model enables the target item to have the characteristics of popular items in the embedding space. Then, by uploading carefully crafted gradients via a small number of malicious users during the model update, we can effectively increase the exposure rate of a target (unpopular) item in the resulted federated recommender. Evaluations on two real-world datasets show that 1) our attack model significantly boosts the exposure rate of the target item in a stealthy way, without harming the accuracy of the poisoned recommender; and 2) existing defenses are not effective enough, highlighting the need for new defenses against our local model poisoning attacks to federated recommender systems.

[265]  arXiv:2110.10927 [pdf, other]
Title: SecureBoost+ : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Gradient boosting decision tree (GBDT) is a widely used ensemble algorithm in the industry. Its vertical federated learning version, SecureBoost, is one of the most popular algorithms used in cross-silo privacy-preserving modeling. As the area of privacy computation thrives in recent years, demands for large-scale and high-performance federated learning have grown dramatically in real-world applications. In this paper, to fulfill these requirements, we propose SecureBoost+ that is both novel and improved from the prior work SecureBoost. SecureBoost+ integrates several ciphertext calculation optimizations and engineering optimizations. The experimental results demonstrate that Secureboost+ has significant performance improvements on large and high dimensional data sets compared to SecureBoost. It makes effective and efficient large-scale vertical federated learning possible.

[266]  arXiv:2110.10928 [pdf, other]
Title: Quantum field theories, Markov random fields and machine learning
Comments: Contribution submitted to the CCP2021: XXXII IUPAP Conference on Computational Physics, Coventry University, United Kingdom. arXiv admin note: substantial text overlap with arXiv:2109.07730
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); High Energy Physics - Lattice (hep-lat)

The transition to Euclidean space and the discretization of quantum field theories on spatial or space-time lattices opens up the opportunity to investigate probabilistic machine learning from the perspective of quantum field theory. Here, we will discuss how discretized Euclidean field theories can be recast within the mathematical framework of Markov random fields, which is a notable class of probabilistic graphical models with applications in a variety of research areas, including machine learning. Specifically, we will demonstrate that the $\phi^{4}$ scalar field theory on a square lattice satisfies the Hammersley-Clifford theorem, therefore recasting it as a Markov random field from which neural networks are additionally derived. We will then discuss applications pertinent to the minimization of an asymmetric distance between the probability distribution of the $\phi^{4}$ machine learning algorithms and that of target probability distributions.

[267]  arXiv:2110.10932 [pdf, other]
Title: Subspace Detours Meet Gromov-Wasserstein
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

In the context of optimal transport methods, the subspace detour approach was recently presented by Muzellec and Cuturi (2019). It consists in building a nearly optimal transport plan in the measures space from an optimal transport plan in a wisely chosen subspace, onto which the original measures are projected. The contribution of this paper is to extend this category of methods to the Gromov-Wasserstein problem, which is a particular type of transport distance involving the inner geometry of the compared distributions. After deriving the associated formalism and properties, we also discuss a specific cost for which we can show connections with the Knothe-Rosenblatt rearrangement. We finally give an experimental illustration on a shape matching problem.

[268]  arXiv:2110.10934 [pdf]
Title: Can Q-learning solve Multi Armed Bantids?
Authors: Refael Vivanti
Comments: arXiv admin note: text overlap with arXiv:1905.10144
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

When a reinforcement learning (RL) method has to decide between several optional policies by solely looking at the received reward, it has to implicitly optimize a Multi-Armed-Bandit (MAB) problem. This arises the question: are current RL algorithms capable of solving MAB problems? We claim that the surprising answer is no. In our experiments we show that in some situations they fail to solve a basic MAB problem, and in many common situations they have a hard time: They suffer from regression in results during training, sensitivity to initialization and high sample complexity. We claim that this stems from variance differences between policies, which causes two problems: The first problem is the "Boring Policy Trap" where each policy have a different implicit exploration depends on its rewards variance, and leaving a boring, or low variance, policy is less likely due to its low implicit exploration. The second problem is the "Manipulative Consultant" problem, where value-estimation functions used in deep RL algorithms such as DQN or deep Actor Critic methods, maximize estimation precision rather than mean rewards, and have a better loss in low-variance policies, which cause the network to converge to a sub-optimal policy. Cognitive experiments on humans showed that noised reward signals may paradoxically improve performance. We explain this using the aforementioned problems, claiming that both humans and algorithms may share similar challenges in decision making.
Inspired by this result, we propose the Adaptive Symmetric Reward Noising (ASRN) method, by which we mean equalizing the rewards variance across different policies, thus avoiding the two problems without affecting the environment's mean rewards behavior. We demonstrate that the ASRN scheme can dramatically improve the results.

[269]  arXiv:2110.10938 [pdf]
Title: Autonomous Dimension Reduction by Flattening Deformation of Data Manifold under an Intrinsic Deforming Field
Authors: Xiaodong Zhuang
Comments: 18 pages, 23 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

A new dimension reduction (DR) method for data sets is proposed by autonomous deforming of data manifolds. The deformation is guided by the proposed deforming vector field, which is defined by two kinds of virtual interactions between data points. The flattening of data manifold is achieved as an emergent behavior under the elastic and repelling interactions between data points, meanwhile the topological structure of the manifold is preserved. To overcome the uneven sampling (or "short-cut edge") problem, the soft neighborhood is proposed, in which the neighbor degree is defined and adaptive interactions between neighbor points is implemented. The proposed method provides a novel geometric viewpoint on dimension reduction. Experimental results prove the effectiveness of the proposed method in dimension reduction, and implicit feature of data sets may also be revealed.

[270]  arXiv:2110.10939 [pdf]
Title: A channel attention based MLP-Mixer network for motor imagery decoding with EEG
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Convolutional neural networks (CNNs) and their variants have been successfully applied to the electroencephalogram (EEG) based motor imagery (MI) decoding task. However, these CNN-based algorithms generally have limitations in perceiving global temporal dependencies of EEG signals. Besides, they also ignore the diverse contributions of different EEG channels to the classification task. To address such issues, a novel channel attention based MLP-Mixer network (CAMLP-Net) is proposed for EEG-based MI decoding. Specifically, the MLP-based architecture is applied in this network to capture the temporal and spatial information. The attention mechanism is further embedded into MLP-Mixer to adaptively exploit the importance of different EEG channels. Therefore, the proposed CAMLP-Net can effectively learn more global temporal and spatial information. The experimental results on the newly built MI-2 dataset indicate that our proposed CAMLP-Net achieves superior classification performance over all the compared algorithms.

[271]  arXiv:2110.10942 [pdf, other]
Title: Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness
Subjects: Machine Learning (cs.LG)

End-to-end (geometric) deep learning has seen first successes in approximating the solution of combinatorial optimization problems. However, generating data in the realm of NP-hard/-complete tasks brings practical and theoretical challenges, resulting in evaluation protocols that are too optimistic. Specifically, most datasets only capture a simpler subproblem and likely suffer from spurious features. We investigate these effects by studying adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features. For this purpose, we derive perturbation models for SAT and TSP. Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound, allowing us to determine the true label of perturbed samples without a solver. Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning. Although such robust solvers exist, we show empirically that the assessed neural solvers do not generalize well w.r.t. small perturbations of the problem instance.

[272]  arXiv:2110.10949 [pdf, other]
Title: Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection
Comments: Accepted to WACV 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multimodal learning is an emerging yet challenging research area. In this paper, we deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs. Being a fleeting action, which is reflected across the modalities, sarcasm detection is challenging since large datasets are not available for this task in the literature. Therefore, we primarily focus on resource-constrained training, where the number of training samples is limited. To this end, we propose a novel multimodal learning system, MuLOT (Multimodal Learning using Optimal Transport), which utilizes self-attention to exploit intra-modal correspondence and optimal transport for cross-modal correspondence. Finally, the modalities are combined with multimodal attention fusion to capture the inter-dependencies across modalities. We test our approach for multimodal sarcasm and humor detection on three benchmark datasets - MUStARD (video, audio, text), UR-FUNNY (video, audio, text), MST (image, text) and obtain 2.1%, 1.54%, and 2.34% accuracy improvements over state-of-the-art.

[273]  arXiv:2110.10952 [pdf, ps, other]
Title: Estimation of Covariance Matrix of Interference for Secure Spatial Modulation against a Malicious Full-duplex Attacker
Comments: 5 pages, 4 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In a secure spatial modulation with a malicious full-duplex attacker, how to obtain the interference space or channel state information (CSI) is very important for Bob to cancel or reduce the interference from Mallory. In this paper, different from existing work with a perfect CSI, the covariance matrix of malicious interference (CMMI) from Mallory is estimated and is used to construct the null-space of interference (NSI). Finally, the receive beamformer at Bob is designed to remove the malicious interference using the NSI. To improve the estimation accuracy, a rank detector relying on Akaike information criterion (AIC) is derived. To achieve a high-precision CMMI estimation, two methods are proposed as follows: principal component analysis-eigenvalue decomposition (PCA-EVD), and joint diagonalization (JD). The proposed PCA-EVD is a rank deduction method whereas the JD method is a joint optimization method with improved performance in low signal to interference plus noise ratio (SINR) region at the expense of increased complexities. Simulation results show that the proposed PCA-EVD performs much better than the existing method like sample estimated covariance matrix (SCM) and EVD in terms of normalized mean square error (NMSE) and secrecy rate (SR). Additionally, the proposed JD method has an excellent NMSE performance better than PCA-EVD in the low SINR region (SINR < 0dB) while in the high SINR region PCA-EVD performs better than JD.

[274]  arXiv:2110.10953 [pdf, other]
Title: MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

With the emergence of service robots and surveillance cameras, dynamic face recognition (DFR) in wild has received much attention in recent years. Face detection and head pose estimation are two important steps for DFR. Very often, the pose is estimated after the face detection. However, such sequential computations lead to higher latency. In this paper, we propose a low latency and lightweight network for simultaneous face detection, landmark localization and head pose estimation. Inspired by the observation that it is more challenging to locate the facial landmarks for faces with large angles, a pose loss is proposed to constrain the learning. Moreover, we also propose an uncertainty multi-task loss to learn the weights of individual tasks automatically. Another challenge is that robots often use low computational units like ARM based computing core and we often need to use lightweight networks instead of the heavy ones, which lead to performance drop especially for small and hard faces. In this paper, we propose online feedback sampling to augment the training samples across different scales, which increases the diversity of training data automatically. Through validation in commonly used WIDER FACE, AFLW and AFLW2000 datasets, the results show that the proposed method achieves the state-of-the-art performance in low computational resources.

[275]  arXiv:2110.10955 [pdf, ps, other]
Title: Multi-label Classification with Partial Annotations using Class-aware Selective Loss
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Large-scale multi-label classification datasets are commonly, and perhaps inevitably, partially annotated. That is, only a small subset of labels are annotated per sample. Different methods for handling the missing labels induce different properties on the model and impact its accuracy. In this work, we analyze the partial labeling problem, then propose a solution based on two key ideas. First, un-annotated labels should be treated selectively according to two probability quantities: the class distribution in the overall dataset and the specific label likelihood for a given data sample. We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a naive estimation computed using the dataset's partial annotations. Second, during the training of the target model, we emphasize the contribution of annotated labels over originally un-annotated labels by using a dedicated asymmetric loss. With our novel approach, we achieve state-of-the-art results on OpenImages dataset (e.g. reaching 87.3 mAP on V6). In addition, experiments conducted on LVIS and simulated-COCO demonstrate the effectiveness of our approach. Code is available at https://github.com/Alibaba-MIIL/PartialLabelingCSL.

[276]  arXiv:2110.10957 [pdf, other]
Title: Vis-TOP: Visual Transformer Overlay Processor
Comments: 13 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR)

In recent years, Transformer has achieved good results in Natural Language Processing (NLP) and has also started to expand into Computer Vision (CV). Excellent models such as the Vision Transformer and Swin Transformer have emerged. At the same time, the platform for Transformer models was extended to embedded devices to meet some resource-sensitive application scenarios. However, due to the large number of parameters, the complex computational flow and the many different structural variants of Transformer models, there are a number of issues that need to be addressed in its hardware design. This is both an opportunity and a challenge. We propose Vis-TOP (Visual Transformer Overlay Processor), an overlay processor for various visual Transformer models. It differs from coarse-grained overlay processors such as CPU, GPU, NPE, and from fine-grained customized designs for a specific model. Vis-TOP summarizes the characteristics of all visual Transformer models and implements a three-layer and two-level transformation structure that allows the model to be switched or changed freely without changing the hardware architecture. At the same time, the corresponding instruction bundle and hardware architecture are designed in three-layer and two-level transformation structure. After quantization of Swin Transformer tiny model using 8-bit fixed points (fix_8), we implemented an overlay processor on the ZCU102. Compared to GPU, the TOP throughput is 1.5x higher. Compared to the existing Transformer accelerators, our throughput per DSP is between 2.2x and 11.7x higher than others. In a word, the approach in this paper meets the requirements of real-time AI in terms of both resource consumption and inference speed. Vis-TOP provides a cost-effective and power-effective solution based on reconfigurable devices for computer vision at the edge.

[277]  arXiv:2110.10963 [pdf, other]
Title: Neuro-Symbolic Reinforcement Learning with First-Order Logic
Comments: EMNLP 2021 (main conference)
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)

Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided. In order to achieve fast convergence and interpretability for the policy in RL, we propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network, which can learn symbolic and interpretable rules in their differentiable network. The method is first to extract first-order logical facts from text observation and external word meaning network (ConceptNet), then train a policy in the network with directly interpretable logical operators. Our experimental results show RL training with the proposed method converges significantly faster than other state-of-the-art neuro-symbolic methods in a TextWorld benchmark.

[278]  arXiv:2110.10966 [pdf, other]
Title: Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data
Comments: Paper accepted at The 32nd British Machine Vision Conference, BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate 7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users. In principle, this could be achieved by a single camera system that is capable of detecting the pose of each vehicle but this would require a large, accurately labelled dataset from which to train the detector. Although large vehicle pose datasets exist (ostensibly developed for autonomous vehicles), we find training on these datasets inadequate. These datasets contain images from a ground level viewpoint, whereas an ideal view for intersection observation would be elevated higher above the road surface. We develop an alternative approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras; showing in the process that large existing autonomous vehicle datasets can be leveraged for pre-training. To fine-tune the monocular 3D object detector, our method utilises multiple 2D detections from overlapping, wide-baseline views and a loss that encodes the subjacent geometric consistency. Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets. We present our training methodology, multi-view reprojection loss, and dataset.

[279]  arXiv:2110.10969 [pdf, other]
Title: Memory Efficient Adaptive Attention For Multiple Domain Learning
Comments: 13 pages, 3 figures, 4 graphs, 3 tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Training CNNs from scratch on new domains typically demands large numbers of labeled images and computations, which is not suitable for low-power hardware. One way to reduce these requirements is to modularize the CNN architecture and freeze the weights of the heavier modules, that is, the lower layers after pre-training. Recent studies have proposed alternative modular architectures and schemes that lead to a reduction in the number of trainable parameters needed to match the accuracy of fully fine-tuned CNNs on new domains. Our work suggests that a further reduction in the number of trainable parameters by an order of magnitude is possible. Furthermore, we propose that new modularization techniques for multiple domain learning should also be compared on other realistic metrics, such as the number of interconnections needed between the fixed and trainable modules, the number of training samples needed, the order of computations required and the robustness to partial mislabeling of the training data. On all of these criteria, the proposed architecture demonstrates advantages over or matches the current state-of-the-art.

[280]  arXiv:2110.10970 [pdf, other]
Title: Fuzzy Algebraic Theories
Subjects: Logic in Computer Science (cs.LO); Category Theory (math.CT); Logic (math.LO)

In this work we propose a formal system for fuzzy algebraic reasoning. The sequent calculus we define is based on two kinds of propositions, capturing equality and existence of terms as members of a fuzzy set. We provide a sound semantics for this calculus and show that there is a notion of free model for any theory in this system, allowing us (with some restrictions) to recover models as Eilenberg-Moore algebras for some monad. We will also prove a completeness result: a formula is derivable from a given theory if and only if it is satisfied by all models of the theory. Finally, leveraging results by Milius and Urbat, we give HSP-like characterizations of subcategories of algebras which are categories of models of particular kinds of theories.

[281]  arXiv:2110.10972 [pdf, other]
Title: Sliced-Wasserstein Gradient Flows
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Minimizing functionals in the space of probability distributions can be done with Wasserstein gradient flows. To solve them numerically, a possible approach is to rely on the Jordan-Kinderlehrer-Otto (JKO) scheme which is analogous to the proximal scheme in Euclidean spaces. However, this bilevel optimization problem is known for its computational challenges, especially in high dimension. To alleviate it, very recent works propose to approximate the JKO scheme leveraging Brenier's theorem, and using gradients of Input Convex Neural Networks to parameterize the density (JKO-ICNN). However, this method comes with a high computational cost and stability issues. Instead, this work proposes to use gradient flows in the space of probability measures endowed with the sliced-Wasserstein (SW) distance. We argue that this method is more flexible than JKO-ICNN, since SW enjoys a closed-form differentiable approximation. Thus, the density at each step can be parameterized by any generative model which alleviates the computational burden and makes it tractable in higher dimensions. Interestingly, we also show empirically that these gradient flows are strongly related to the usual Wasserstein gradient flows, and that they can be used to minimize efficiently diverse machine learning functionals.

[282]  arXiv:2110.10973 [pdf, other]
Title: LOA: Logical Optimal Actions for Text-based Interaction Games
Comments: ACL-IJCNLP 2021 (demo paper)
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)

We present Logical Optimal Actions (LOA), an action decision architecture of reinforcement learning applications with a neuro-symbolic framework which is a combination of neural network and symbolic knowledge acquisition approach for natural language interaction games. The demonstration for LOA experiments consists of a web-based interactive platform for text-based games and visualization for acquired knowledge for improving interpretability for trained rules. This demonstration also provides a comparison module with other neuro-symbolic approaches as well as non-symbolic state-of-the-art agent models on the same text-based games. Our LOA also provides open-sourced implementation in Python for the reinforcement learning environment to facilitate an experiment for studying neuro-symbolic agents. Code: https://github.com/ibm/loa

[283]  arXiv:2110.10974 [pdf, other]
Title: A Decentralized Framework for Serverless Edge Computing in the Internet of Things
Journal-ref: IEEE Transactions on Network and Service Management, Volume: 18, Issue: 2, June 2021
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)

Serverless computing is becoming widely adopted among cloud providers, thus making increasingly popular the Function-as-a-Service (FaaS) programming model, where the developers realize services by packaging sequences of stateless function calls.
The current technologies are very well suited to data centers, but cannot provide equally good performance in decentralized environments, such as edge computing systems, which are expected to be typical for Internet of Things (IoT) applications.
In this paper, we fill this gap by proposing a framework for efficient dispatching of stateless tasks to in-network executors so as to minimize the response times while exhibiting short- and long-term fairness, also leveraging information from a virtualized network infrastructure when available.
Our solution is shown to be simple enough to be installed on devices with limited computational capabilities, such as IoT gateways, especially when using a hierarchical forwarding extension.
We evaluate the proposed platform by means of extensive emulation experiments with a prototype implementation in realistic conditions.
The results show that it is able to smoothly adapt to the mobility of clients and to the variations of their service request patterns, while coping promptly with network congestion.

[284]  arXiv:2110.10978 [pdf, other]
Title: Multiobjective Dijkstra A*
Comments: 14 pages, 12 figures, 4 tables
Subjects: Discrete Mathematics (cs.DM)

We introduce the Multiobjective Dijkstra A* (MDA*) algorithm, a label setting algorithm for the One-to-One Multiobjective Shortest Path Problem that guides the search toward the target node. For the design of our new algorithm, we combine the recently published Biobjective and Multiobjective Dijkstra algorithms (B/MDA) with the ideas used to design the classical A* algorithm for Shortest Path problems. Thus, our algorithm requires a monotone node heuristic as part of its input. For any node, the heuristic underestimates the costs of a path from this node to the target node of the search. Paths in the priority queue are then sorted according to the sum of their costs and the value of the heuristic at their final node. The direct implication is that paths that are closer to the target are processed earlier. Together with some pruning techniques, the number of iterations needed to solve the given One-to-One MOSP instances can be drastically reduced. In our computational experiments, we use different types of three dimensional instances to show that the MDA* algorithm clearly outperforms the MDA.

[285]  arXiv:2110.10980 [pdf]
Title: Ethics-Based Auditing of Automated Decision-Making Systems: Nature, Scope, and Limitations
Comments: Artificial Intelligence, Auditing, Automated Decision-Making, Ethics, Governance
Journal-ref: Mokander, J., Morley, J., Taddeo, M. et al. Ethics-Based Auditing of Automated Decision-Making Systems: Nature, Scope, and Limitations. Sci Eng Ethics 27, 44 (2021)
Subjects: Computers and Society (cs.CY); Software Engineering (cs.SE)

Important decisions that impact human lives, livelihoods, and the natural environment are increasingly being automated. Delegating tasks to so-called automated decision-making systems (ADMS) can improve efficiency and enable new solutions. However, these benefits are coupled with ethical challenges. For example, ADMS may produce discriminatory outcomes, violate individual privacy, and undermine human self-determination. New governance mechanisms are thus needed that help organisations design and deploy ADMS in ways that are ethical, while enabling society to reap the full economic and social benefits of automation. In this article, we consider the feasibility and efficacy of ethics-based auditing (EBA) as a governance mechanism that allows organisations to validate claims made about their ADMS. Building on previous work, we define EBA as a structured process whereby an entity's present or past behaviour is assessed for consistency with relevant principles or norms. We then offer three contributions to the existing literature. First, we provide a theoretical explanation of how EBA can contribute to good governance by promoting procedural regularity and transparency. Second, we propose seven criteria for how to design and implement EBA procedures successfully. Third, we identify and discuss the conceptual, technical, social, economic, organisational, and institutional constraints associated with EBA. We conclude that EBA should be considered an integral component of multifaced approaches to managing the ethical risks posed by ADMS.

[286]  arXiv:2110.10983 [pdf, other]
Title: Optimizing Multi-Taper Features for Deep Speaker Verification
Comments: To appear in IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past work has reported promising automatic speaker verification (ASV) results with Gaussian mixture model-based classifiers, the performance of multi-taper MFCCs with deep ASV systems remains an open question. Instead of a static-taper design, we propose to optimize the multi-taper estimator jointly with a deep neural network trained for ASV tasks. With a maximum improvement on the SITW corpus of 25.8% in terms of equal error rate over the static-taper, our method helps preserve a balanced level of leakage and variance, providing more robustness.

[287]  arXiv:2110.10984 [pdf, other]
Title: The popular assignment problem: when cardinality is more important than popularity
Comments: 28 pages, 7 figures
Subjects: Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT)

We consider a matching problem in a bipartite graph $G=(A\cup B,E)$ where each node in $A$ is an agent having preferences in partial order over her neighbors, while nodes in $B$ are objects with no preferences. The size of our matching is more important than node preferences; thus, we are interested in maximum matchings only. Any pair of maximum matchings in $G$ (equivalently, perfect matchings or assignments) can be compared by holding a head-to-head election between them where agents are voters. The goal is to compute an assignment $M$ such that there is no better or "more popular" assignment. This is the popular assignment problem and it generalizes the well-studied popular matching problem.
Popular assignments need not always exist. We show a polynomial-time algorithm that decides if the given instance admits one or not, and computes one, if so. In instances with no popular assignment, we consider the problem of finding an almost popular assignment, i.e., an assignment with minimum unpopularity margin. We show an $O^*(|E|^k)$ time algorithm for deciding if there exists an assignment with unpopularity margin at most $k$. We show that this algorithm is essentially optimal by proving that the problem is $\mathsf{W}_l[1]$-hard with parameter $k$.
We also consider the minimum-cost popular assignment problem when there are edge costs, and show its $\mathsf{NP}$-hardness even when all edge costs are in $\{0,1\}$ and agents have strict preferences. By contrast, we propose a polynomial-time algorithm to the problem of deciding if there exists a popular assignment with a given set of forced/forbidden edges (this tractability holds even for partially ordered preferences). Our algorithms are combinatorial and based on LP duality. They search for an appropriate witness or dual certificate, and when a certificate cannot be found, we prove that the desired assignment does not exist in $G$.

[288]  arXiv:2110.10986 [pdf, other]
Title: Multi-stable design of triangulated origami structures on cones of revolution
Authors: Georg Nawratil
Comments: 31 pages, 16 figures
Subjects: Computational Geometry (cs.CG)

It is well-known that the Kresling pattern of congruent triangles can be arranged either circularly on a cylinder of revolution or in a helical way. In both cases the resulting cylindrical structures are multi-stable. We generalize these arrangements with respect to cones of revolution, where our approach allows to construct structures, which snap between conical realizations whose apex angles serve as design parameters. In this context we also figure out shaky realizations, intervals for self-intersection free realizations and an interesting property related to the cross sectional area. Finally, we analyze these origami structures with respect to their capability to snap by means of the so-called snappability index.

[289]  arXiv:2110.10987 [pdf, other]
Title: Learning OFDM Waveforms with PAPR and ACLR Constraints
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

An attractive research direction for future communication systems is the design of new waveforms that can both support high throughputs and present advantageous signal characteristics. Although most modern systems use orthogonal frequency-division multiplexing (OFDM) for its efficient equalization, this waveform suffers from multiple limitations such as a high adjacent channel leakage ratio (ACLR) and high peak-to-average power ratio (PAPR). In this paper, we propose a learning-based method to design OFDM-based waveforms that satisfy selected constraints while maximizing an achievable information rate. To that aim, we model the transmitter and the receiver as convolutional neural networks (CNNs) that respectively implement a high-dimensional modulation scheme and perform the detection of the transmitted bits. This leads to an optimization problem that is solved using the augmented Lagrangian method. Evaluation results show that the end-to-end system is able to satisfy target PAPR and ACLR constraints and allows significant throughput gains compared to a tone reservation (TR) baseline. An additional advantage is that no dedicated pilots are needed.

[290]  arXiv:2110.10992 [pdf, other]
Title: Scheduling Algorithms for Age of Information Differentiation with Random Arrivals
Comments: 14 pages, 7 figures, 2 tables
Subjects: Information Theory (cs.IT); Performance (cs.PF); Probability (math.PR)

We study age-agnostic scheduling in a non-preemptive status update system with two sources sending time-stamped information packets at random instances to a common monitor through a single server. The server is equipped with a waiting room holding the freshest packet from each source called "single-buffer per-source queueing". The server is assumed to be work-conserving and when the waiting room has two waiting packets (one from each source), a probabilistic scheduling policy is applied so as to provide Age of Information (AoI) differentiation for the two sources of interest. Assuming Poisson packet arrivals and exponentially distributed service times, the exact distributions of AoI and also Peak AoI (PAoI) for each source are first obtained. Subsequently, this analytical tool is used to numerically obtain the optimum probabilistic scheduling policy so as to minimize the weighted average AoI/PAoI by means of which differentiation can be achieved between the two sources. In addition, a pair of heuristic age-agnostic schedulers are proposed on the basis of heavy-traffic analysis and comparatively evaluated in a wide variety of scenarios, and guidelines are provided for scheduling and AoI differentiation in status update systems with two sources.

[291]  arXiv:2110.10994 [pdf, ps, other]
Title: Interpretable Machine Learning for Resource Allocation with Application to Ventilator Triage
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

Rationing of healthcare resources is a challenging decision that policy makers and providers may be forced to make during a pandemic, natural disaster, or mass casualty event. Well-defined guidelines to triage scarce life-saving resources must be designed to promote transparency, trust, and consistency. To facilitate buy-in and use during high-stress situations, these guidelines need to be interpretable and operational. We propose a novel data-driven model to compute interpretable triage guidelines based on policies for Markov Decision Process that can be represented as simple sequences of decision trees ("tree policies"). In particular, we characterize the properties of optimal tree policies and present an algorithm based on dynamic programming recursions to compute good tree policies. We utilize this methodology to obtain simple, novel triage guidelines for ventilator allocations for COVID-19 patients, based on real patient data from Montefiore hospitals. We also compare the performance of our guidelines to the official New York State guidelines that were developed in 2015 (well before the COVID-19 pandemic). Our empirical study shows that the number of excess deaths associated with ventilator shortages could be reduced significantly using our policy. Our work highlights the limitations of the existing official triage guidelines, which need to be adapted specifically to COVID-19 before being successfully deployed.

[292]  arXiv:2110.11001 [pdf, other]
Title: Pixel-Level Face Image Quality Assessment for Explainable Face Recognition
Comments: Submitted to CVPR 2022, Code will be made publicly-available in November 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

An essential factor to achieve high performance in face recognition systems is the quality of its samples. Since these systems are involved in various daily life there is a strong need of making face recognition processes understandable for humans. In this work, we introduce the concept of pixel-level face image quality that determines the utility of pixels in a face image for recognition. Given an arbitrary face recognition network, in this work, we propose a training-free approach to assess the pixel-level qualities of a face image. To achieve this, a model-specific quality value of the input image is estimated and used to build a sample-specific quality regression model. Based on this model, quality-based gradients are back-propagated and converted into pixel-level quality estimates. In the experiments, we qualitatively and quantitatively investigated the meaningfulness of the pixel-level qualities based on real and artificial disturbances and by comparing the explanation maps on ICAO-incompliant faces. In all scenarios, the results demonstrate that the proposed solution produces meaningful pixel-level qualities. The code is publicly available.

[293]  arXiv:2110.11006 [pdf, other]
Title: Bristle: Decentralized Federated Learning in Byzantine, Non-i.i.d. Environments
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Federated learning (FL) is a privacy-friendly type of machine learning where devices locally train a model on their private data and typically communicate model updates with a server. In decentralized FL (DFL), peers communicate model updates with each other instead. However, DFL is challenging since (1) the training data possessed by different peers is often non-i.i.d. (i.e., distributed differently between the peers) and (2) malicious, or Byzantine, attackers can share arbitrary model updates with other peers to subvert the training process.
We address these two challenges and present Bristle, middleware between the learning application and the decentralized network layer. Bristle leverages transfer learning to predetermine and freeze the non-output layers of a neural network, significantly speeding up model training and lowering communication costs. To securely update the output layer with model updates from other peers, we design a fast distance-based prioritizer and a novel performance-based integrator. Their combined effect results in high resilience to Byzantine attackers and the ability to handle non-i.i.d. classes.
We empirically show that Bristle converges to a consistent 95% accuracy in Byzantine environments, outperforming all evaluated baselines. In non-Byzantine environments, Bristle requires 83% fewer iterations to achieve 90% accuracy compared to state-of-the-art methods. We show that when the training classes are non-i.i.d., Bristle significantly outperforms the accuracy of the most Byzantine-resilient baselines by 2.3x while reducing communication costs by 90%.

[294]  arXiv:2110.11007 [pdf, other]
Title: Attack Detection and Localization in Smart Grid with Image-based Deep Learning
Subjects: Cryptography and Security (cs.CR)

Smart grid's objective is to enable electricity and information to flow two-way while providing effective, robust, computerized, and decentralized energy delivery. This necessitates the use of state estimation-based techniques and real-time analysis to ensure that effective controls are deployed properly. However, the reliance on communication technologies makes such systems susceptible to sophisticated data integrity attacks imposing serious threats to the overall reliability of smart grid. To detect such attacks, advanced and efficient anomaly detection solutions are needed. In this paper, a two-stage deep learning-based framework is carefully designed by embedding power system's characteristics enabling precise attack detection and localization. First, we encode temporal correlations of the multivariate power system time-series measurements as 2D images using image-based representation approaches such as Gramian Angular Field (GAF) and Recurrence Plot (RP) to obtain the latent data characteristics. These images are then utilized to build a highly reliable and resilient deep Convolutional Neural Network (CNN)-based multi-label classifier capable of learning both low and high level characteristics in the images to detect and discover the exact attack locations without leveraging any prior statistical assumptions. The proposed method is evaluated on the IEEE 57-bus system using real-world load data. Also, a comparative study is carried out. Numerical results indicate that the proposed multi-class cyber-intrusion detection framework outperforms the current conventional and deep learning-based attack detection methods.

[295]  arXiv:2110.11010 [pdf, other]
Title: Algorithmic Amplification of Politics on Twitter
Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)

Content on Twitter's home timeline is selected and ordered by personalization algorithms. By consistently ranking certain content higher, these algorithms may amplify some messages while reducing the visibility of others. There's been intense public and scholarly debate about the possibility that some political groups benefit more from algorithmic amplification than others. We provide quantitative evidence from a long-running, massive-scale randomized experiment on the Twitter platform that committed a randomized control group including nearly 2M daily active accounts to a reverse-chronological content feed free of algorithmic personalization. We present two sets of findings. First, we studied Tweets by elected legislators from major political parties in 7 countries. Our results reveal a remarkably consistent trend: In 6 out of 7 countries studied, the mainstream political right enjoys higher algorithmic amplification than the mainstream political left. Consistent with this overall trend, our second set of findings studying the U.S. media landscape revealed that algorithmic amplification favours right-leaning news sources. We further looked at whether algorithms amplify far-left and far-right political groups more than moderate ones: contrary to prevailing public belief, we did not find evidence to support this hypothesis. We hope our findings will contribute to an evidence-based debate on the role personalization algorithms play in shaping political content consumption.

[296]  arXiv:2110.11013 [pdf, other]
Title: Spatial Location Constraint Prototype Loss for Open Set Recognition
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

One of the challenges in pattern recognition is open set recognition. Compared with closed set recognition, open set recognition needs to reduce not only the empirical risk, but also the open space risk, and the reduction of these two risks corresponds to classifying the known classes and identifying the unknown classes respectively. How to reduce the open space risk is the key of open set recognition. This paper explores the origin of the open space risk by analyzing the distribution of known and unknown classes features. On this basis, the spatial location constraint prototype loss function is proposed to reduce the two risks simultaneously. Extensive experiments on multiple benchmark datasets and many visualization results indicate that our methods is significantly superior to other existing approaches.

[297]  arXiv:2110.11015 [pdf]
Title: A Utility Maximization Model of Pedestrian and Driver Interactions
Comments: 10 pages, 7 figures
Subjects: Machine Learning (cs.LG)

Many models account for the traffic flow of road users but few take the details of local interactions into consideration and how they could deteriorate into safety-critical situations. Building on the concept of sensorimotor control, we develop a modeling framework applying the principles of utility maximization, motor primitives, and intermittent action decisions to account for the details of interactive behaviors among road users. The framework connects these principles to the decision theory and is applied to determine whether such an approach can reproduce the following phenomena: When two pedestrians travel on crossing paths, (a) their interaction is sensitive to initial asymmetries, and (b) based on which, they rapidly resolve collision conflict by adapting their behaviors. When a pedestrian crosses the road while facing an approaching car, (c) either road user yields to the other to resolve their conflict, akin to the pedestrian interaction, and (d) the outcome reveals a specific situational kinematics, associated with the nature of vehicle acceleration. We show that these phenomena emerge naturally from our modeling framework when the model can evolve its parameters as a consequence of the situations. We believe that the modeling framework and phenomenon-centered analysis offer promising tools to understand road user interactions. We conclude with a discussion on how the model can be instrumental in studying the safety-critical situations when including other variables in road-user interactions.

[298]  arXiv:2110.11017 [pdf, other]
Title: Learning Time-Varying Graphs from Online Data
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

This work proposes an algorithmic framework to learn time-varying graphs from online data. The generality offered by the framework renders it model-independent, i.e., it can be theoretically analyzed in its abstract formulation and then instantiated under a variety of model-dependent graph learning problems. This is possible by phrasing (time-varying) graph learning as a composite optimization problem, where different functions regulate different desiderata, e.g., data fidelity, sparsity or smoothness. Instrumental for the findings is recognizing that the dependence of the majority (if not all) data-driven graph learning algorithms on the data is exerted through the empirical covariance matrix, representing a sufficient statistic for the estimation problem. Its user-defined recursive update enables the framework to work in non-stationary environments, while iterative algorithms building on novel time-varying optimization tools explicitly take into account the temporal dynamics, speeding up convergence and implicitly including a temporal-regularization of the solution. We specialize the framework to three well-known graph learning models, namely, the Gaussian graphical model (GGM), the structural equation model (SEM), and the smoothness-based model (SBM), where we also introduce ad-hoc vectorization schemes for structured matrices (symmetric, hollows, etc.) which are crucial to perform correct gradient computations, other than enabling to work in low-dimensional vector spaces and hence easing storage requirements. After discussing the theoretical guarantees of the proposed framework, we corroborate it with extensive numerical tests in synthetic and real data.

[299]  arXiv:2110.11018 [pdf, other]
Title: Newtonian Mechanics Based Transient Stability PART II: Individual Machine
Comments: This paper contains 9 pages and 23 figures
Subjects: Systems and Control (eess.SY)

The paper analyzes the mechanisms of the individual-machine and also its advantages in TSA. Based on the critical-machine monitoring of the original system trajectory, it is clarified that the individual-machine strictly follows the machine paradigms. These strict followings of the paradigms bring the two advantages of the individual-machine method in TSA: (i) the individual-machine trajectory stability is characterized precisely, and (ii) the individual-machine trajectory variance is depicted clearly at IMPP. The two advantages are fully reflected in the precise definitions of individual-machine based transient stability concepts. In particular, the critical machine swing is clearly depicted through the IDSP or IDLP of the machine, the critical stability of the system is strictly defined as the critical stability of the most-severely disturbed machine, and the individual-machine potential energy surface is also precisely modeled through the IMPE of the machine. Simulation results show the effectiveness of the individual-machine in TSA.

[300]  arXiv:2110.11023 [pdf, other]
Title: Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge, even in the absence of a powerful but static teacher network. Motivated by these findings, we propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance. Furthermore, an online distillation strategy is utilized to train the teacher and students simultaneously. To evaluate the performance of the proposed approach, extensive experiments were conducted using three different versions of teacher-student networks on benchmark biomedical classification (MSI vs. MSS) and object detection (Polyp Detection) tasks. Ensemble of student networks trained in the proposed manner achieved better results than the ensemble of students trained using KD or ML individually, establishing the benefit of augmenting knowledge transfer from teacher to students with peer-to-peer learning between students.

[301]  arXiv:2110.11024 [pdf, other]
Title: Watermarking Graph Neural Networks based on Backdoor Attacks
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Graph Neural Networks (GNNs) have achieved promising performance in various real-world applications. Building a powerful GNN model is not a trivial task, as it requires a large amount of training data, powerful computing resources, and human expertise on fine-tuning the model. What is more, with the development of adversarial attacks, e.g., model stealing attacks, GNNs raise challenges to model authentication. To avoid copyright infringement on GNNs, it is necessary to verify the ownership of the GNN models.
In this paper, we present a watermarking framework for GNNs for both graph and node classification tasks. We 1) design two strategies to generate watermarked data for the graph classification and one for the node classification task, 2) embed the watermark into the host model through training to obtain the watermarked GNN model, and 3) verify the ownership of the suspicious model in a black-box setting. The experiments show that our framework can verify the ownership of GNN models with a very high probability (around $100\%$) for both tasks. In addition, we experimentally show that our watermarking approach is still effective even when considering suspicious models obtained from different architectures than the owner's.

[302]  arXiv:2110.11027 [pdf, other]
Title: FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion
Comments: Under review as a conference paper at ICLR 2022
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Today data is often scattered among billions of resource-constrained edge devices with security and privacy constraints. Federated Learning (FL) has emerged as a viable solution to learn a global model while keeping data private, but the model complexity of FL is impeded by the computation resources of edge nodes. In this work, we investigate a novel paradigm to take advantage of a powerful server model to break through model capacity in FL. By selectively learning from multiple teacher clients and itself, a server model develops in-depth knowledge and transfers its knowledge back to clients in return to boost their respective performance. Our proposed framework achieves superior performance on both server and client models and provides several advantages in a unified framework, including flexibility for heterogeneous client architectures, robustness to poisoning attacks, and communication efficiency between clients and server. By bridging FL effectively with larger server model training, our proposed paradigm paves ways for robust and continual knowledge accumulation from distributed and private data.

[303]  arXiv:2110.11033 [pdf, ps, other]
Title: Fundamental Wireless Performance of a Building
Comments: Accepted at IEEE Wireless Communications
Subjects: Networking and Internet Architecture (cs.NI)

Over 80% of wireless traffic already takes place in buildings. Like water, gas, and electricity, wireless communication is becoming one of the most fundamental utilities of a building. It is well known that building structures have a significant impact on in-building wireless networks. If we seek to achieve the optimal network performance indoors, the buildings should be designed with the objective of maximizing wireless performance. So far, wireless performance has not yet been considered when designing a building. In this paper, we introduce a novel and interdisciplinary concept of building wireless performance (BWP) to a wide audience in both wireless communications and building design, emphasizing its broad impacts on wireless network development and deployment, and on building layout/material design. We first give an overview of the BWP evaluation framework proposed in our state-of-the-art works and explain their interconnections. Then, we outline the potential research directions in this exciting research area to encourage further interdisciplinary research.

[304]  arXiv:2110.11034 [pdf, ps, other]
Title: Certifying C program correctness with respect to CompCert with VeriFast
Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL)

VeriFast is a powerful tool for verification of various correctness properties of C programs using symbolic execution. However, VeriFast itself has not been verified. We present a proof-of-concept extension which generates a correctness certificate for each successful verification run individually. This certificate takes the form of a Coq script containing two proofs which, when successfully checked by Coq, together remove the need for trusting in the correctness of VeriFast itself.
The first proves a lemma expressing the correctness of the program with respect to a big step operational semantics developed by ourselves, intended to reflect VeriFast's interpretation of C. We have formalized this semantics in Coq as cbsem. This lemma is proven by symbolic execution in Coq, which in turn is implemented by transforming the exported AST of the program into a Coq proposition representing the symbolic execution performed by VeriFast itself.
The second proves the correctness of the same C program with respect to CompCert's Clight big step semantics. This proof simply applies our proof of the soundness of cbsem with respect to CompCert Clight to the first proof.

[305]  arXiv:2110.11036 [pdf, other]
Title: RefRec: Pseudo-labels Refinement via Shape Reconstruction for Unsupervised 3D Domain Adaptation
Comments: 3DV 2021 (Oral) Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Unsupervised Domain Adaptation (UDA) for point cloud classification is an emerging research problem with relevant practical motivations. Reliance on multi-task learning to align features across domains has been the standard way to tackle it. In this paper, we take a different path and propose RefRec, the first approach to investigate pseudo-labels and self-training in UDA for point clouds. We present two main innovations to make self-training effective on 3D data: i) refinement of noisy pseudo-labels by matching shape descriptors that are learned by the unsupervised task of shape reconstruction on both domains; ii) a novel self-training protocol that learns domain-specific decision boundaries and reduces the negative impact of mislabelled target samples and in-domain intra-class variability. RefRec sets the new state of the art in both standard benchmarks used to test UDA for point cloud classification, showcasing the effectiveness of self-training for this important problem.

[306]  arXiv:2110.11037 [pdf]
Title: "Computer Says No": Algorithmic Decision Support and Organisational Responsibility
Comments: 24 pages, 2 figures
Subjects: Computers and Society (cs.CY)

Algorithmic decision support is increasingly used in a whole array of different contexts and structures in various areas of society, influencing many people's lives. Its use raises questions, among others, about accountability, transparency and responsibility. While there is substantial research on the issue of algorithmic systems and responsibility in general, there is little to no prior research on organisational responsibility and its attribution. Our article aims to fill that gap; we give a brief overview of the central issues connected to ADS, responsibility and decision-making in organisational contexts and identify open questions and research gaps. Furthermore, we describe a set of guidelines and a complementary digital tool to assist practitioners in mapping responsibility when introducing ADS within their organisational context.

[307]  arXiv:2110.11039 [pdf, other]
Title: Automated Climate Analyses Using Knowledge Graph
Comments: Accepted in Proc. IEEE AP-S Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting, 2021
Subjects: Databases (cs.DB)

The FAIR (Findable, Accessible, Interoperable, Reusable) data principles are fundamental for climate researchers and all stakeholders in the current digital ecosystem. In this paper, we demonstrate how relational climate data can be "FAIR" and modeled using RDF, in line with Semantic Web technologies and our Climate Analysis ontology. Thus, heterogeneous climate data can be stored in graph databases and offered as Linked Data on the Web. As a result, climate researchers will be able to use the standard SPARQL query language to query these sources directly on the Web. In this paper, we demonstrate the usefulness of our SPARQL endpoint for automated climate analytics. We illustrate two sample use cases that establish the advantage of representing climate data as knowledge graphs.

[308]  arXiv:2110.11040 [pdf, other]
Title: InterpolationSLAM: A Novel Robust Visual SLAM System in Rotational Motion
Comments: arXiv admin note: substantial text overlap with arXiv:2110.02593
Subjects: Robotics (cs.RO)

In recent years, visual SLAM has achieved great progress and development in different scenes, however, there are still many problems to be solved. The SLAM system is not only restricted by the external scenes but is also affected by its movement mode, such as movement speed, rotational motion, etc. As the representatives of the most excellent networks for frame interpolation, Sepconv-slomo and EDSC can predict high-quality intermediate frame between the previous frame and the current frame. Intuitively, frame interpolation technology can enrich the information of images sequences, the number of which is limited by the camera's frame rate, and thus decreasing the probability of SLAM system's failure rate. In this article, we propose an InterpolationSLAM framework. InterpolationSLAM is robust in rotational movement for Monocular and RGB-D configurations. By detecting the rotation and performing interpolation processing at the rotated position, pose of the system can be estimated more accurately, thereby improving the accuracy and robustness of the SLAM system in the rotational movement.

[309]  arXiv:2110.11043 [pdf, other]
Title: Improving the Deployment of Recycling Classification through Efficient Hyper-Parameter Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The paradigm of automated waste classification has recently seen a shift in the domain of interest from conventional image processing techniques to powerful computer vision algorithms known as convolutional neural networks (CNN). Historically, CNNs have demonstrated a strong dependency on powerful hardware for real-time classification, yet the need for deployment on weaker embedded devices is greater than ever. The work in this paper proposes a methodology for reconstructing and tuning conventional image classification models, using EfficientNets, to decrease their parameterisation with no trade-off in model accuracy and develops a pipeline through TensorRT for accelerating such models to run at real-time on an NVIDIA Jetson Nano embedded device. The train-deployment discrepancy, relating how poor data augmentation leads to a discrepancy in model accuracy between training and deployment, is often neglected in many papers and thus the work is extended by analysing and evaluating the impact real word perturbations had on model accuracy once deployed. The scope of the work concerns developing a more efficient variant of WasteNet, a collaborative recycling classification model. The newly developed model scores a test-set accuracy of 95.8\% with a real word accuracy of 95%, a 14% increase over the original. Our acceleration pipeline boosted model throughput by 750% to 24 inferences per second on the Jetson Nano and real-time latency of the system was verified through servomotor latency analysis.

[310]  arXiv:2110.11044 [pdf, other]
Title: Bayesian Meta-Learning Through Variational Gaussian Processes
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Recent advances in the field of meta-learning have tackled domains consisting of large numbers of small ("few-shot") supervised learning tasks. Meta-learning algorithms must be able to rapidly adapt to any individual few-shot task, fitting to a small support set within a task and using it to predict the labels of the task's query set. This problem setting can be extended to the Bayesian context, wherein rather than predicting a single label for each query data point, a model predicts a distribution of labels capturing its uncertainty. Successful methods in this domain include Bayesian ensembling of MAML-based models, Bayesian neural networks, and Gaussian processes with learned deep kernel and mean functions. While Gaussian processes have a robust Bayesian interpretation in the meta-learning context, they do not naturally model non-Gaussian predictive posteriors for expressing uncertainty. In this paper, we design a theoretically principled method, VMGP, extending Gaussian-process-based meta-learning to allow for high-quality, arbitrary non-Gaussian uncertainty predictions. On benchmark environments with complex non-smooth or discontinuous structure, we find our VMGP method performs significantly better than existing Bayesian meta-learning baselines.

[311]  arXiv:2110.11048 [pdf, other]
Title: Mixer-based lidar lane detection network and dataset for urban roads
Comments: 15 pages, 12 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Accurate lane detection under various road conditions is a critical function for autonomous driving. Generally, when detected lane lines from a front camera image are projected into a birds-eye view (BEV) for motion planning, the resulting lane lines are often distorted. And convolutional neural network (CNN)-based feature extractors often lose resolution when increasing the receptive field to detect global features such as lane lines. However, Lidar point cloud has little image distortion in the BEV-projection. Since lane lines are thin and stretch over entire BEV image while occupying only a small portion, lane lines should be detected as a global feature with high resolution. In this paper, we propose Lane Mixer Network (LMN) that extracts local features from Lidar point cloud, recognizes global features, and detects lane lines using a BEV encoder, a Mixer-based global feature extractor, and a detection head, respectively. In addition, we provide a world-first large urban lane dataset for Lidar, K-Lane, which has maximum 6 lanes under various urban road conditions. We demonstrate that the proposed LMN achieves the state-of-the-art performance, an F1 score of 91.67%, with K-Lane. The K-Lane, LMN training code, pre-trained models, and total dataset development platform are available at github.

[312]  arXiv:2110.11052 [pdf, other]
Title: WareVR: Virtual Reality Interface for Supervision of Autonomous Robotic System Aimed at Warehouse Stocktaking
Comments: Accepted to 2021 IEEE International Conference on Systems, Man, and Cybernetics, 7 pages, 8 figures
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)

WareVR is a novel human-robot interface based on a virtual reality (VR) application to interact with a heterogeneous robotic system for automated inventory management. We have created an interface to supervise an autonomous robot remotely from a secluded workstation in a warehouse that could benefit during the current pandemic COVID-19 since the stocktaking is a necessary and regular process in warehouses, which involves a group of people. The proposed interface allows regular warehouse workers without experience in robotics to control the heterogeneous robotic system consisting of an unmanned ground vehicle (UGV) and unmanned aerial vehicle (UAV). WareVR provides visualization of the robotic system in a digital twin of the warehouse, which is accompanied by a real-time video stream from the real environment through an on-board UAV camera. Using the WareVR interface, the operator can conduct different levels of stocktaking, monitor the inventory process remotely, and teleoperate the drone for a more detailed inspection. Besides, the developed interface includes remote control of the UAV for intuitive and straightforward human interaction with the autonomous robot for stocktaking. The effectiveness of the VR-based interface was evaluated through the user study in a "visual inspection" scenario.

[313]  arXiv:2110.11053 [pdf, ps, other]
Title: $Q$-tensor gradient flow with quasi-entropy and discretizations preserving physical constraints
Authors: Yanli Wang, Jie Xu
Comments: 25 pages, 7 figures
Subjects: Numerical Analysis (math.NA)

We propose and analyze numerical schemes for the gradient flow of $Q$-tensor with the quasi-entropy. The quasi-entropy is a strictly convex, rotationally invariant elementary function, giving a singular potential constraining the eigenvalues of $Q$ within the physical range $(-1/3,2/3)$. Compared with the potential derived from the Bingham distribution, the quasi-entropy has the same asymptotic behavior and underlying physics. Meanwhile, it is very easy to evaluate because of its simple expression. For the elastic energy, we include all the rotationally invariant terms. The numerical schemes for the gradient flow are built on the nice properties of the quasi-entropy. The first-order time discretization is uniquely solvable, keeping the physical constraints and energy dissipation, which are all independent of the time step. The second-order time discretization keeps the first two properties unconditionally, and the third with an $O(1)$ restriction on the time step. These results also hold when we further incorporate a second-order discretization in space. Error estimates are also established for time discretization and full discretization. Numerical examples about defect patterns are presented to validate the theoretical results.

[314]  arXiv:2110.11054 [pdf, other]
Title: A Geometric Approach for Computing the Kernelof a Polyhedron
Subjects: Computational Geometry (cs.CG); Graphics (cs.GR)

We present a geometric algorithm to compute the geometric kernel of a generic polyhedron. The geometric kernel (or simply kernel) is definedas the set of points from which the whole polyhedron is visible. Whilst the computation of the kernel for a polygon has already been largely addressed in the literature, less has been done for polyhedra. Currently, the principal implementation of the kernel estimation is based on the solution of a linear programming problem. We compare against it on several examples, showing that our method is more efficient in analysing the elements of a generic tessellation. Details on the technical implementation and discussions on pros and cons of the method are also provided.

[315]  arXiv:2110.11062 [pdf, other]
Title: Transfer beyond the Field of View: Dense Panoramic Semantic Segmentation via Unsupervised Domain Adaptation
Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (IEEE T-ITS). Dataset and code will be made publicly available at this https URL arXiv admin note: substantial text overlap with arXiv:2108.06383
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)

Autonomous vehicles clearly benefit from the expanded Field of View (FoV) of 360-degree sensors, but modern semantic segmentation approaches rely heavily on annotated training data which is rarely available for panoramic images. We look at this problem from the perspective of domain adaptation and bring panoramic semantic segmentation to a setting, where labelled training data originates from a different distribution of conventional pinhole camera images. To achieve this, we formalize the task of unsupervised domain adaptation for panoramic semantic segmentation and collect DensePASS - a novel densely annotated dataset for panoramic segmentation under cross-domain conditions, specifically built to study the Pinhole-to-Panoramic domain shift and accompanied with pinhole camera training examples obtained from Cityscapes. DensePASS covers both, labelled- and unlabelled 360-degree images, with the labelled data comprising 19 classes which explicitly fit the categories available in the source (i.e. pinhole) domain. Since data-driven models are especially susceptible to changes in data distribution, we introduce P2PDA - a generic framework for Pinhole-to-Panoramic semantic segmentation which addresses the challenge of domain divergence with different variants of attention-augmented domain adaptation modules, enabling the transfer in output-, feature-, and feature confidence spaces. P2PDA intertwines uncertainty-aware adaptation using confidence values regulated on-the-fly through attention heads with discrepant predictions. Our framework facilitates context exchange when learning domain correspondences and dramatically improves the adaptation performance of accuracy- and efficiency-focused models. Comprehensive experiments verify that our framework clearly surpasses unsupervised domain adaptation- and specialized panoramic segmentation approaches.

[316]  arXiv:2110.11064 [pdf, other]
Title: Robust Edge-Direct Visual Odometry based on CNN edge detection and Shi-Tomasi corner optimization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

In this paper, we propose a robust edge-direct visual odometry (VO) based on CNN edge detection and Shi-Tomasi corner optimization. Four layers of pyramids were extracted from the image in the proposed method to reduce the motion error between frames. This solution used CNN edge detection and Shi-Tomasi corner optimization to extract information from the image. Then, the pose estimation is performed using the Levenberg-Marquardt (LM) algorithm and updating the keyframes. Our method was compared with the dense direct method, the improved direct method of Canny edge detection, and ORB-SLAM2 system on the RGB-D TUM benchmark. The experimental results indicate that our method achieves better robustness and accuracy.

[317]  arXiv:2110.11069 [pdf, ps, other]
Title: Pacta sunt servanda: legal contracts in Stipula
Subjects: Programming Languages (cs.PL)

There is a growing interest in running legal contracts on digital systems, at the same time, it is important to understand to what extent software contracts may capture legal content. We then undertake a foundational study of legal contracts and we distill four main features: agreement, permissions, violations and obligations. We therefore design Stipula, a domain specific language that assists lawyers in programming legal contracts through specific patterns. The language is based on a small set of abstractions that correspond to common patterns in legal contracts, and that are amenable to be executed either on centralized or on distributed systems. Stipula comes with a formal semantics and an observational equivalence, that provide for a clear account of the contracts' behaviour. The expressive power of the language is illustrated by a set of examples that correspond to template contracts that are often used in practice.

[318]  arXiv:2110.11070 [pdf]
Title: A Nested Weighted Tchebycheff Multi-Objective Bayesian Optimization Approach for Flexibility of Unknown Utopia Estimation in Expensive Black-box Design Problems
Comments: 35 pages, 8 figures in main text and 2 figures in supplementary
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a nested weighted Tchebycheff Multi-objective Bayesian optimization framework where we build a regression model selection procedure from an ensemble of models, towards better estimation of the uncertain parameters of the weighted-Tchebycheff expensive black-box multi-objective function. In existing work, a weighted Tchebycheff MOBO approach has been demonstrated which attempts to estimate the unknown utopia in formulating acquisition function, through calibration using a priori selected regression model. However, the existing MOBO model lacks flexibility in selecting the appropriate regression models given the guided sampled data and therefore, can under-fit or over-fit as the iterations of the MOBO progress, reducing the overall MOBO performance. As it is too complex to a priori guarantee a best model in general, this motivates us to consider a portfolio of different families of predictive models fitted with current training data, guided by the WTB MOBO; the best model is selected following a user-defined prediction root mean-square-error-based approach. The proposed approach is implemented in optimizing a multi-modal benchmark problem and a thin tube design under constant loading of temperature-pressure, with minimizing the risk of creep-fatigue failure and design cost. Finally, the nested weighted Tchebycheff MOBO model performance is compared with different MOBO frameworks with respect to accuracy in parameter estimation, Pareto-optimal solutions and function evaluation cost. This method is generalized enough to consider different families of predictive models in the portfolio for best model selection, where the overall design architecture allows for solving any high-dimensional (multiple functions) complex black-box problems and can be extended to any other global criterion multi-objective optimization methods where prior knowledge of utopia is required.

[319]  arXiv:2110.11072 [pdf, other]
Title: Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce
Journal-ref: International Conference on Web Search and Data Mining (WSDM), 2022
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

In e-commerce, the watchlist enables users to track items over time and has emerged as a primary feature, playing an important role in users' shopping journey. Watchlist items typically have multiple attributes whose values may change over time (e.g., price, quantity). Since many users accumulate dozens of items on their watchlist, and since shopping intents change over time, recommending the top watchlist items in a given context can be valuable. In this work, we study the watchlist functionality in e-commerce and introduce a novel watchlist recommendation task. Our goal is to prioritize which watchlist items the user should pay attention to next by predicting the next items the user will click. We cast this task as a specialized sequential recommendation task and discuss its characteristics. Our proposed recommendation model, Trans2D, is built on top of the Transformer architecture, where we further suggest a novel extended attention mechanism (Attention2D) that allows to learn complex item-item, attribute-attribute and item-attribute patterns from sequential-data with multiple item attributes. Using a large-scale watchlist dataset from eBay, we evaluate our proposed model, where we demonstrate its superiority compared to multiple state-of-the-art baselines, many of which are adapted for this task.

[320]  arXiv:2110.11073 [pdf, other]
Title: RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System
Comments: First version
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Reinforcement learning based recommender systems (RL-based RS) aims at learning a good policy from a batch of collected data, with casting sequential recommendation to multi-step decision-making tasks. However, current RL-based RS benchmarks commonly have a large reality gap, because they involve artificial RL datasets or semi-simulated RS datasets, and the trained policy is directly evaluated in the simulation environment. In real-world situations, not all recommendation problems are suitable to be transformed into reinforcement learning problems. Unlike previous academic RL researches, RL-based RS suffer from extrapolation error and the difficulties of being well validated before deployment. In this paper, we introduce the RL4RS (Reinforcement Learning for Recommender Systems) benchmark - a new resource fully collected from industrial applications to train and evaluate RL algorithms with special concerns on the above issues. It contains two datasets, tuned simulation environments, related advanced RL baselines, data understanding tools, and counterfactual policy evaluation algorithms. The RL4RS suit can be found at https://github.com/fuxiAIlab/RL4RS. In addition to the RL-based recommender systems, we expect the resource to contribute to research in reinforcement learning and neural combinatorial optimization.

[321]  arXiv:2110.11075 [pdf, other]
Title: Enabling a Social Robot to Process Social Cues to Detect when to Help a User
Comments: Presented at AI-HRI symposium as part of AAAI-FSS 2021 (arXiv:2109.10836)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

It is important for socially assistive robots to be able to recognize when a user needs and wants help. Such robots need to be able to recognize human needs in a real-time manner so that they can provide timely assistance. We propose an architecture that uses social cues to determine when a robot should provide assistance. Based on a multimodal fusion approach upon eye gaze and language modalities, our architecture is trained and evaluated on data collected in a robot-assisted Lego building task. By focusing on social cues, our architecture has minimal dependencies on the specifics of a given task, enabling it to be applied in many different contexts. Enabling a social robot to recognize a user's needs through social cues can help it to adapt to user behaviors and preferences, which in turn will lead to improved user experiences.

[322]  arXiv:2110.11079 [pdf, other]
Title: Tagged Documents Co-Clustering
Comments: 15 pages, submitted and accepted to the 2021 World Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE'21) - track ICAI21
Subjects: Information Retrieval (cs.IR)

Tags are short sequences of words allowing to describe textual and non-texual resources such as as music, image or book. Tags could be used by machine information retrieval systems to access quickly a document. These tags can be used to build recommender systems to suggest similar items to a user. However, the number of tags per document is limited, and often distributed according to a Zipf law. In this paper, we propose a methodology to cluster tags into conceptual groups. Data are preprocessed to remove power-law effects and enhance the context of low-frequency words. Then, a hierarchical agglomerative co-clustering algorithm is proposed to group together the most related tags into clusters. The capabilities were evaluated on a sparse synthetic dataset and a real-world tag collection associated with scientific papers. The task being unsupervised, we propose some stopping criterion for selectecting an optimal partitioning.

[323]  arXiv:2110.11080 [pdf]
Title: Continuous Authentication Using Mouse Movements, Machine Learning, and Minecraft
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Mouse dynamics has grown in popularity as a novel irreproducible behavioral biometric. Datasets which contain general unrestricted mouse movements from users are sparse in the current literature. The Balabit mouse dynamics dataset produced in 2016 was made for a data science competition and despite some of its shortcomings, is considered to be the first publicly available mouse dynamics dataset. Collecting mouse movements in a dull administrative manner as Balabit does may unintentionally homogenize data and is also not representative of realworld application scenarios. This paper presents a novel mouse dynamics dataset that has been collected while 10 users play the video game Minecraft on a desktop computer. Binary Random Forest (RF) classifiers are created for each user to detect differences between a specific users movements and an imposters movements. Two evaluation scenarios are proposed to evaluate the performance of these classifiers; one scenario outperformed previous works in all evaluation metrics, reaching average accuracy rates of 92%, while the other scenario successfully reported reduced instances of false authentications of imposters.

[324]  arXiv:2110.11084 [pdf, other]
Title: 3D-ANAS v2: Grafting Transformer Module on Automatically Designed ConvNet for Hyperspectral Image Classification
Comments: 15 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Hyperspectral image (HSI) classification has been a hot topic for decides, as Hyperspectral image has rich spatial and spectral information, providing strong basis for distinguishing different land-cover objects. Benefiting from the development of deep learning technologies, deep learning based HSI classification methods have achieved promising performance. Recently, several neural architecture search (NAS) algorithms are proposed for HSI classification, which further improve the accuracy of HSI classification to a new level. In this paper, we revisit the search space designed in previous HSI classification NAS methods and propose a novel hybrid search space, where 3D convolution, 2D spatial convolution and 2D spectral convolution are employed. Compared search space proposed in previous works, the serach space proposed in this paper is more aligned with characteristic of HSI data that is HSIs have a relatively low spatial resolution and an extremely high spectral resolution. In addition, to further improve the classification accuracy, we attempt to graft the emerging transformer module on the automatically designed ConvNet to adding global information to local region focused features learned by ConvNet. We carry out comparison experiments on three public HSI datasets which have different spectral characteristics to evaluate the proposed method. Experimental results show that the proposed method achieves much better performance than comparison approaches, and both adopting the proposed hybrid search space and grafting transformer module improves classification accuracy. Especially on the most recently captured dataset Houston University, overall accuracy is improved by up to nearly 6 percentage points. Code will be available at: https://github.com/xmm/3D-ANAS-V2.

[325]  arXiv:2110.11088 [pdf, other]
Title: RoMA: a Method for Neural Network Robustness Measurement and Assessment
Authors: Natan Levy, Guy Katz
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Neural network models have become the leading solution for a large variety of tasks, such as classification, language processing, protein folding, and others. However, their reliability is heavily plagued by adversarial inputs: small input perturbations that cause the model to produce erroneous outputs. Adversarial inputs can occur naturally when the system's environment behaves randomly, even in the absence of a malicious adversary, and are a severe cause for concern when attempting to deploy neural networks within critical systems. In this paper, we present a new statistical method, called Robustness Measurement and Assessment (RoMA), which can measure the expected robustness of a neural network model. Specifically, RoMA determines the probability that a random input perturbation might cause misclassification. The method allows us to provide formal guarantees regarding the expected frequency of errors that a trained model will encounter after deployment. Our approach can be applied to large-scale, black-box neural networks, which is a significant advantage compared to recently proposed verification methods. We apply our approach in two ways: comparing the robustness of different models, and measuring how a model's robustness is affected by the magnitude of input perturbation. One interesting insight obtained through this work is that, in a classification network, different output labels can exhibit very different robustness levels. We term this phenomenon categorial robustness. Our ability to perform risk and robustness assessments on a categorial basis opens the door to risk mitigation, which may prove to be a significant step towards neural network certification in safety-critical applications.

[326]  arXiv:2110.11090 [pdf, ps, other]
Title: Blockchain-based Result Verification for Computation Offloading
Journal-ref: 19th International Conference on Service Oriented Computing (ICSOC 2021), Springer
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Offloading of computation, e.g., to the cloud, is today a major task in distributed systems. Usually, consumers which apply offloading have to trust that a particular functionality offered by a service provider is delivering correct results. While redundancy (i.e., offloading a task to more than one service provider) or (partial) reprocessing help to identify correct results, they also lead to significantly higher cost.
Hence, within this paper, we present an approach to verify the results of offchain computations via the blockchain. For this, we apply zero-knowledge proofs to provide evidence that results are correct. Using our approach, it is possible to establish trust between a service consumer and arbitrary service providers. We evaluate our approach using a very well-known example task, i.e., the Traveling Salesman Problem.

[327]  arXiv:2110.11091 [pdf, other]
Title: E-DPNCT: An Enhanced Attack Resilient Differential Privacy Model For Smart Grids Using Split Noise Cancellation
Comments: 10 pages, 12 figues, 4 tables
Subjects: Cryptography and Security (cs.CR)

High frequency reporting of energy utilization data in smart grid can leads to leaking sensitive information regarding end users life style. We propose A Differential Private Noise Cancellation Model for Load Monitoring and Billing for Smart Meters (DPNCT) to protect the privacy of the smart grid data using noise cancellation protocol with a master smart meter to provide accurate billing and load monitoring. Next, we evaluate the performance of DPNCT under various privacy attacks such as filtering attack, negative noise cancellation attack and collusion attack. The DPNCT model relies on trusted master smart meters and is vulnerable to collusion attack where adversary collude with malicious smart meters in order to get private information of other smart meters. In this paper, we propose an Enhanced DPNCT (E-DPNCT) where we use multiple master smart meters for split noise at each instant in time t for better protection against collusion attack. We did extensive comparison of our E-DPNCT model with state of the art attack resistant privacy preserving models such as EPIC for collision attack and with Barbosa Differentialy Private (BDP) model for filtering attack. We evaluate our E-DPNCT model with real time data which shows significant improvement in privacy attack scenarios without any compute intensive operations.

[328]  arXiv:2110.11098 [pdf, other]
Title: Index Coded - NOMA in Vehicular Ad Hoc Networks
Comments: 13 pages, 5 figures and 9 tables
Subjects: Information Theory (cs.IT)

The demand for multimedia services is growing day by day in vehicular ad-hoc networks (VANETs), resulting in high spectral usage and network congestion. Non-orthogonal multiple access (NOMA) is a promising wireless communication technique to solve the problems related to spectral efficiency effectively. The index coding (IC) is a powerful method to improve spectral utilization, where a sender aims to satisfy the needs of multiple receivers with a minimum number of transmissions. By combining these two approaches, in this work, we propose a novel technique called index coded NOMA (IC-NOMA), where we apply NOMA techniques on index coded data to reduce the number of transmissions further. This work shows that the IC-NOMA system demands a specific design for index codes to reap the advantages of NOMA. We have done the feasibility analysis of the proposed method in a general scenario and proposed an index code design to integrate IC over NOMA for the best efficiency. Through detailed analytical studies it is validated that the proposed transmission system provides improved spectral efficiency and power saving compared to conventional IC systems.

[329]  arXiv:2110.11102 [pdf, other]
Title: Physical Layer Security in Relay Networks with Outdated Relay Selection
Subjects: Information Theory (cs.IT)

In this paper, the secrecy performance of a cooperative relay network with outdated relay selection is investigated where an eavesdropper intercepts the channels between the source and the destination. The best relay is chosen among N relays based on the opportunistic relay selection algorithm, which may not be the best relay at the time of transmission because of the outdated channel state information. We derive closed-form analytical expressions for the non-zero secrecy capacity, the secrecy outage probability, and the ergodic secrecy capacity. Finally, our theoretical analysis is validated by the numerical results, and detailed discussions and insights are given.

[330]  arXiv:2110.11104 [pdf, other]
Title: Intelligent Reflecting Surface for Multi-Path Beam Routing with Active/Passive Beam Splitting and Combining
Comments: 5 pages, 4 figures (9 subfigures)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Intelligent reflecting surface (IRS) can be densely deployed in wireless networks to significantly enhance the communication channels. In this letter, we consider the downlink transmission from a multi-antenna base station (BS) to a single-antenna user, by exploiting the cooperative passive beamforming (CPB) and line-of-sight (LoS) path diversity gains of multi-IRS signal reflection. Unlike existing works where only one single multi-IRS reflection path from the BS to user is selected, we propose a new and more general {\it \textbf{multi-path beam routing}} scheme. Specifically, the BS sends the user's information signal via multiple orthogonal active beams (termed as {\it \textbf{active beam splitting}}), which point towards different IRSs. Then, these beamed signals are subsequently reflected by selected IRSs via their CPB in different paths, and finally coherently combined at the user's receiver (thus named {\it \textbf{passive beam combining}}). For this scheme, we formulate a new multi-path beam routing design problem to jointly optimize the number of IRS reflection paths, the selected IRSs for each of the reflection paths, the active/passive beamforming at the BS/each selected IRS, as well as the BS's power allocation over different active beams, so as to maximize the received signal power at the user. To solve this challenging problem, we first derive the optimal BS/IRS beamforming and BS power allocation for a given set of reflection paths. The clique-based approach in graph theory is then applied to solve the remaining multi-path selection problem efficiently. Simulation results show that our proposed multi-path beam routing scheme significantly outperforms its conventional single-path beam routing special case.

[331]  arXiv:2110.11106 [pdf, other]
Title: Reinforcement Learning Based Optimal Camera Placement for Depth Observation of Indoor Scenes
Comments: Accepted to IEEE International Conference on Networking, Sensing and Control (ICNSC) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Exploring the most task-friendly camera setting -- optimal camera placement (OCP) problem -- in tasks that use multiple cameras is of great importance. However, few existing OCP solutions specialize in depth observation of indoor scenes, and most versatile solutions work offline. To this problem, an OCP online solution to depth observation of indoor scenes based on reinforcement learning is proposed in this paper. The proposed solution comprises a simulation environment that implements scene observation and reward estimation using shadow maps and an agent network containing a soft actor-critic (SAC)-based reinforcement learning backbone and a feature extractor to extract features from the observed point cloud layer-by-layer. Comparative experiments with two state-of-the-art optimization-based offline methods are conducted. The experimental results indicate that the proposed system outperforms seven out of ten test scenes in obtaining lower depth observation error. The total error in all test scenes is also less than 90% of the baseline ones. Therefore, the proposed system is more competent for depth camera placement in scenarios where there is no prior knowledge of the scenes or where a lower depth observation error is the main objective.

[332]  arXiv:2110.11107 [pdf, other]
Title: Extraction of Positional Player Data from Broadcast Soccer Videos
Comments: Accepted for publication at WACV'22; Preprint
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Computer-aided support and analysis are becoming increasingly important in the modern world of sports. The scouting of potential prospective players, performance as well as match analysis, and the monitoring of training programs rely more and more on data-driven technologies to ensure success. Therefore, many approaches require large amounts of data, which are, however, not easy to obtain in general. In this paper, we propose a pipeline for the fully-automated extraction of positional data from broadcast video recordings of soccer matches. In contrast to previous work, the system integrates all necessary sub-tasks like sports field registration, player detection, or team assignment that are crucial for player position estimation. The quality of the modules and the entire system is interdependent. A comprehensive experimental evaluation is presented for the individual modules as well as the entire pipeline to identify the influence of errors to subsequent modules and the overall result. In this context, we propose novel evaluation metrics to compare the output with ground-truth positional data.

[333]  arXiv:2110.11108 [pdf, other]
Title: Applying Second-Order Quantifier Elimination in Inspecting Gödel's Ontological Proof
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

In recent years, G\"odel's ontological proof and variations of it were formalized and analyzed with automated tools in various ways. We supplement these analyses with a modeling in an automated environment based on first-order logic extended by predicate quantification. Formula macros are used to structure complex formulas and tasks. The analysis is presented as a generated type-set document where informal explanations are interspersed with pretty-printed formulas and outputs of reasoners for first-order theorem proving and second-order quantifier elimination. Previously unnoticed or obscured aspects and details of G\"odel's proof become apparent. Practical application possibilities of second-order quantifier elimination are shown and the encountered elimination tasks may serve as benchmarks.

[334]  arXiv:2110.11109 [pdf, ps, other]
Title: On the Expressive Power of TeamLTL and First-Order Team Logic over Hyperproperties
Journal-ref: 27th International Workshop WoLLIC Proceedings (2021)
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)

In this article we study linear temporal logics with team semantics (TeamLTL) that are novel logics for defining hyperproperties. We define Kamp-type translations of these logics into fragments of first-order team logic and second-order logic. We also characterize the expressive power and the complexity of model-checking and satisfiability of team logic and second-order logic by relating them to second- and third-order arithmetic. Our results set in a larger context the recent results of L\"uck showing that the extension of TeamLTL by the Boolean negation is highly undecidable under the so-called synchronous semantics. We also study stutter-invariant fragments of extensions of TeamLTL.

[335]  arXiv:2110.11110 [pdf, ps, other]
Title: A Secretive Coded Caching for Shared Cache Systems using PDAs
Comments: 10 pages and 4 figures
Subjects: Information Theory (cs.IT)

This paper considers the secretive coded caching problem with shared caches in which no user must have access to the files that it did not demand. In a shared cache network, the users are served by a smaller number of helper caches and each user is connected to exactly one helper cache. To ensure the secrecy constraint in shared cache networks, each user is required to have an individual cache of at least unit file size. For this setting, a secretive coded caching scheme was proposed recently in the literature (\enquote{Secretive Coded Caching with Shared Caches}, in \textit{IEEE Communications Letters}, 2021), and it requires a subpacketization level which is in the exponential order of the number of helper caches. By utilizing the PDA constructions, we propose a procedure to obtain new secretive coded caching schemes for shared caches with reduced subpacketization levels. We also show that the existing secretive coded caching scheme for shared caches can be recovered using our procedure. Furthermore, we derive a lower bound on the secretive transmission rate using cut-set arguments and demonstrate the order-optimality of the proposed secretive coded caching scheme.

[336]  arXiv:2110.11111 [pdf, other]
Title: A Deep Insight into Measuring Face Image Utility with General and Face-specific Image Quality Metrics
Comments: 8 pages, 5 figures, IEEE Winter Conf. on Applications of Computer Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Quality scores provide a measure to evaluate the utility of biometric samples for biometric recognition. Biometric recognition systems require high-quality samples to achieve optimal performance. This paper focuses on face images and the measurement of face image utility with general and face-specific image quality metrics. While face-specific metrics rely on features of aligned face images, general image quality metrics can be used on the global image and relate to human perceptions. In this paper, we analyze the gap between the general image quality metrics and the face image quality metrics. Our contribution lies in a thorough examination of how different the image quality assessment algorithms relate to the utility for the face recognition task. The results of image quality assessment algorithms are further compared with those of dedicated face image quality assessment algorithms. In total, 25 different quality metrics are evaluated on three face image databases, BioSecure, LFW, and VGGFace2 using three open-source face recognition solutions, SphereFace, ArcFace, and FaceNet. Our results reveal a clear correlation between learned image metrics to face image utility even without being specifically trained as a face utility measure. Individual handcrafted features lack general stability and perform significantly worse than general face-specific quality metrics. We additionally provide a visual insight into the image areas contributing to the quality score of a selected set of quality assessment methods.

[337]  arXiv:2110.11115 [pdf, other]
Title: Improving Non-autoregressive Generation with Mixup Training
Subjects: Computation and Language (cs.CL)

While pre-trained language models have achieved great success on various natural language understanding tasks, how to effectively leverage them into non-autoregressive generation tasks remains a challenge. To solve this problem, we present a non-autoregressive generation model based on pre-trained transformer models. To bridge the gap between autoregressive and non-autoregressive models, we propose a simple and effective iterative training method called MIx Source and pseudo Target (MIST). Unlike other iterative decoding methods, which sacrifice the inference speed to achieve better performance based on multiple decoding iterations, MIST works in the training stage and has no effect on inference time. Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results for fully non-autoregressive models. We also demonstrate that our method can be used to a variety of pre-trained models. For instance, MIST based on the small pre-trained model also obtains comparable performance with seq2seq models.

[338]  arXiv:2110.11121 [pdf, other]
Title: Joint Power Control, Channel Assignment and Cell Association in Heterogeneous Cellular Networks
Subjects: Information Theory (cs.IT)

Heterogeneous network (HetNet) is a promising concept for increasing capacity and alleviating spectrum scarcity. In HetNets, however, channel assignment and transmit power control affect the distribution of users among base stations. We present a novel scheme to maximize the uplink sum rate in two-tier HetNets with one macrocell and several femtocells, where the transmit power of each user is bounded and at least one channel is assigned to each user. We divide the problem into two sub-problems: one for channel assignment and one for transmit power control and solve them by iteratively alternating between the two. Our scheme is convergent and yields a transmission rate above 6 bps/Hz for almost 50% of users as compared to the same for 10% of users in SINR-based schemes. When users are in cell boundaries, the average transmission rate for fractional frequency reuse is up to 20% more compared to the conventional full frequency reuse. This is due to reduced transmit power resulting in less interference.

[339]  arXiv:2110.11128 [pdf, other]
Title: A Strong Baseline for Semi-Supervised Incremental Few-Shot Learning
Comments: Accepted by BMVC2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Few-shot learning (FSL) aims to learn models that generalize to novel classes with limited training samples. Recent works advance FSL towards a scenario where unlabeled examples are also available and propose semi-supervised FSL methods. Another line of methods also cares about the performance of base classes in addition to the novel ones and thus establishes the incremental FSL scenario. In this paper, we generalize the above two under a more realistic yet complex setting, named by Semi-Supervised Incremental Few-Shot Learning (S2 I-FSL). To tackle the task, we propose a novel paradigm containing two parts: (1) a well-designed meta-training algorithm for mitigating ambiguity between base and novel classes caused by unreliable pseudo labels and (2) a model adaptation mechanism to learn discriminative features for novel classes while preserving base knowledge using few labeled and all the unlabeled data. Extensive experiments on standard FSL, semi-supervised FSL, incremental FSL, and the firstly built S2 I-FSL benchmarks demonstrate the effectiveness of our proposed method.

[340]  arXiv:2110.11129 [pdf, other]
Title: Data-driven finite element method with RVE generated foam data
Subjects: Computational Engineering, Finance, and Science (cs.CE)

The data-driven finite element method proposed by Kirchdoerfer and Ortiz [1] allows to elude the material modeling step. Instead, a previously obtained data set is used directly in the algorithm to describe the material behavior under deformation. Usually, this data set is expected to be gained experimentally. The following empirical treatment is skipped in the data-driven framework and the data is implemented in the algorithm directly. The data-driven problem is rewritten as a minimization problem of a distance function subject to the conservation laws. This paper presents a computational approach to deduce a data set prior to the finite element computation. Representative volume element computations are conducted to deduce a macroscopic material behavior of a polyurethane foam structure. The typical linear load regime of the foam allows us to generate a large material database which can be used as an input for the data-driven finite element computation. Furthermore, we also work out how to proceed in the case of (non-)linear and (an-)isotropic material behavior in order to obtain suitable material data sets. The numerical example which is conducted with the foam data is a typical rubber sealing profile. In the data-driven computation itself, we use a numerical method proposed in [2] to decrease the computing and storage demands.

[341]  arXiv:2110.11130 [pdf, other]
Title: Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System
Comments: 24 pages, 11 figures, to be published at NeurIPS 2021
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

Computational level explanations based on optimal feedback control with signal-dependent noise have been able to account for a vast array of phenomena in human sensorimotor behavior. However, commonly a cost function needs to be assumed for a task and the optimality of human behavior is evaluated by comparing observed and predicted trajectories. Here, we introduce inverse optimal control with signal-dependent noise, which allows inferring the cost function from observed behavior. To do so, we formalize the problem as a partially observable Markov decision process and distinguish between the agent's and the experimenter's inference problems. Specifically, we derive a probabilistic formulation of the evolution of states and belief states and an approximation to the propagation equation in the linear-quadratic Gaussian problem with signal-dependent noise. We extend the model to the case of partial observability of state variables from the point of view of the experimenter. We show the feasibility of the approach through validation on synthetic data and application to experimental data. Our approach enables recovering the costs and benefits implicit in human sequential sensorimotor behavior, thereby reconciling normative and descriptive approaches in a computational framework.

[342]  arXiv:2110.11133 [pdf, ps, other]
Title: Newton-Type Methods For Simultaneous Matrix Diagonalization
Authors: Rima Khouja (CRISAM), Bernard Mourrain (CRISAM), Jean-Claude Yakoubsohn (IMT)
Subjects: Numerical Analysis (math.NA)

This paper proposes a Newton type method to solve numerically the eigenproblem of several diagonalizable matrices, which pairwise commute. A classical result states that these matrices are simultaneously diagonalizable. From a suitable system of equations associated to this problem, we construct a sequence which converges quadratically towards the solution. This construction is not based on the resolution of linear system as it is the case in the classical Newton method. Moreover, we provide a theoretical analysis of this construction to exhibit a condition to get a quadratic convergence. We also propose numerical experiments, which illustrate the theoretical results. This shows that classical QR method would gain in efficiency incorporating the tests given by the theory.

[343]  arXiv:2110.11137 [pdf, other]
Title: Control of Humanoid in Multiple Fixed and Moving Unilateral Contacts
Authors: Julien Roux (LIRMM), Saeid Samadi (LIRMM), Eisoku Kuroiwa, Takahide Yoshiike, Abderrahmane Kheddar (CNRS-AIST JRL, LIRMM)
Journal-ref: 20th International Conference on ADVANCED ROBOTICS (ICAR 2021), In press
Subjects: Robotics (cs.RO)

Enforcing balance of multi-limbed robots in multiple non-coplanar unilateral contact settings is challenging when a subset of such contacts are also induced in motion tasks. The first contribution of this paper is in enhancing the computational performance of state-of-the-art geometric center-of-mass inclusion-based balance method to be integrated online as part of a task-space whole-body control framework. As a consequence, our second contribution lies in integrating such a balance region with relevant contact force distribution without pre-computing a target center-of-mass. This last feature is essential to leave the latter with freedom to better comply with other existing tasks that are not captured in classical twolevel approaches. We assess the performance of our proposed method through experiments using the HRP-4 humanoid robot.

[344]  arXiv:2110.11140 [pdf, other]
Title: Dual Encoding U-Net for Spatio-Temporal Domain Shift Frame Prediction
Comments: 8 pages, 4 figures, 5 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The landscape of city-wide mobility behaviour has altered significantly over the past 18 months. The ability to make accurate and reliable predictions on such behaviour has likewise changed drastically with COVID-19 measures impacting how populations across the world interact with the different facets of mobility. This raises the question: "How does one use an abundance of pre-covid mobility data to make predictions on future behaviour in a present/post-covid environment?" This paper seeks to address this question by introducing an approach for traffic frame prediction using a lightweight Dual-Encoding U-Net built using only 12 Convolutional layers that incorporates a novel approach to skip-connections between Convolutional LSTM layers. This approach combined with an intuitive handling of training data can model both a temporal and spatio-temporal domain shift (gitlab.com/alchera/alchera-traffic4cast-2021).

[345]  arXiv:2110.11141 [pdf, other]
Title: DeepBND: a Machine Learning approach to enhance Multiscale Solid Mechanics
Comments: It has been submitted to Journal of Computational Physics
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)

Effective properties of materials with random heterogeneous structures are typically determined by homogenising the mechanical quantity of interest in a window of observation. The entire problem setting encompasses the solution of a local PDE and some averaging formula for the quantity of interest in such domain. There are relatively standard methods in the literature to completely determine the formulation except for two choices: i) the local domain itself and the ii) boundary conditions. Hence, the modelling errors are governed by the quality of these two choices. The choice i) relates to the degree of representativeness of a microscale sample, i.e., it is essentially a statistical characteristic. Naturally, its reliability is higher as the size of the observation window becomes larger and/or the number of samples increases. On the other hand, excepting few special cases there is no automatic guideline to handle ii). Although it is known that the overall effect of boundary condition becomes less important with the size of the microscale domain, the computational cost to simulate such large problem several times might be prohibitive even for relatively small accuracy requirements. Here we introduce a machine learning procedure to select most suitable boundary conditions for multiscale problems, particularly those arising in solid mechanics. We propose the combination Reduced-Order Models and Deep Neural Networks in an offline phase, whilst the online phase consists in the very same homogenisation procedure plus one (cheap) evaluation of the trained model for boundary conditions. Hence, the method allows an implementation with minimal changes in existing codes and the use of relatively small domains without losing accuracy, which reduces the computational cost by several orders of magnitude.

[346]  arXiv:2110.11147 [pdf, other]
Title: Unique Continuation on Quadratic Curves for Harmonic Functions
Authors: Yufei Ke, Yu Chen
Subjects: Numerical Analysis (math.NA)

The unique continuation on quadratic curves for harmonic functions is discussed in this paper. By using complex extension method, the conditional stability of unique continuation along quadratic curves for harmonic functions is illustrated. The numerical algorithm is provided based on collocation method and Tikhonov regularization. The stability estimates on parabolic and hyperbolic curves for harmonic functions are demonstrated by numerical examples respectively.

[347]  arXiv:2110.11148 [pdf, other]
Title: HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification
Comments: accepted in BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Human beings learn and accumulate hierarchical knowledge over their lifetime. This knowledge is associated with previous concepts for consolidation and hierarchical construction. However, current incremental learning methods lack the ability to build a concept hierarchy by associating new concepts to old ones. A more realistic setting tackling this problem is referred to as Incremental Implicitly-Refined Classification (IIRC), which simulates the recognition process from coarse-grained categories to fine-grained categories. To overcome forgetting in this benchmark, we propose Hierarchy-Consistency Verification (HCV) as an enhancement to existing continual learning methods. Our method incrementally discovers the hierarchical relations between classes. We then show how this knowledge can be exploited during both training and inference. Experiments on three setups of varying difficulty demonstrate that our HCV module improves performance of existing continual learning methods under this IIRC setting by a large margin. Code is available in https://github.com/wangkai930418/HCV_IIRC.

[348]  arXiv:2110.11150 [pdf, ps, other]
Title: Towards strong pruning for lottery tickets with non-zero biases
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The strong lottery ticket hypothesis holds the promise that pruning randomly initialized deep neural networks could offer a computationally efficient alternative to deep learning with stochastic gradient descent. Common parameter initialization schemes and existence proofs, however, are focused on networks with zero biases, thus foregoing the potential universal approximation property of pruning. To fill this gap, we extend multiple initialization schemes and existence proofs to non-zero biases, including explicit 'looks-linear' approaches for ReLU activation functions. These do not only enable truly orthogonal parameter initialization but also reduce potential pruning errors. In experiments on standard benchmark data sets, we further highlight the practical benefits of non-zero bias initialization schemes, and present theoretically inspired extensions for state-of-the-art strong lottery ticket pruning.

[349]  arXiv:2110.11154 [pdf, other]
Title: Personalized Transfer of User Preferences for Cross-domain Recommendation
Comments: Accepted by WSDM 2022
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Cold-start problem is still a very challenging problem in recommender systems. Fortunately, the interactions of the cold-start users in the auxiliary source domain can help cold-start recommendations in the target domain. How to transfer user's preferences from the source domain to the target domain, is the key issue in Cross-domain Recommendation (CDR) which is a promising solution to deal with the cold-start problem. Most existing methods model a common preference bridge to transfer preferences for all users. Intuitively, since preferences vary from user to user, the preference bridges of different users should be different. Along this line, we propose a novel framework named Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR). Specifically, a meta network fed with users' characteristic embeddings is learned to generate personalized bridge functions to achieve personalized transfer of preferences for each user. To learn the meta network stably, we employ a task-oriented optimization procedure. With the meta-generated personalized bridge function, the user's preference embedding in the source domain can be transformed into the target domain, and the transformed user preference embedding can be utilized as the initial embedding for the cold-start user in the target domain. Using large real-world datasets, we conduct extensive experiments to evaluate the effectiveness of PTUPCDR on both cold-start and warm-start stages. The code has been available at \url{https://github.com/easezyc/WSDM2022-PTUPCDR}.

[350]  arXiv:2110.11155 [pdf, other]
Title: DeLag: Detecting Latency Degradation Patterns in Service-based Systems
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG); Performance (cs.PF)

Performance debugging in production is a fundamental activity in modern service-based systems. The diagnosis of performance issues is often time-consuming, since it requires thorough inspection of large volumes of traces and performance indices. In this paper we present DeLag, a novel automated search-based approach for diagnosing performance issues in service-based systems. DeLag identifies subsets of requests that show, in the combination of their Remote Procedure Call execution times, symptoms of potentially relevant performance issues. We call such symptoms Latency Degradation Patterns. DeLag simultaneously search for multiple latency degradation patterns while optimizing precision, recall and latency dissimilarity. Experimentation on 700 datasets of requests generated from two microservice-based systems shows that our approach provide better and more stable effectiveness than three state-of-the-art approaches and general purpose machine learning clustering algorithms. Moreover, DeLag outperforms in terms of efficiency the second and the third most effective baseline techniques on the largest datasets used in our evaluation.

[351]  arXiv:2110.11159 [pdf, other]
Title: Each Attribute Matters: Contrastive Attention for Sentence-based Image Editing
Comments: Accepted by BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)

Sentence-based Image Editing (SIE) aims to deploy natural language to edit an image. Offering potentials to reduce expensive manual editing, SIE has attracted much interest recently. However, existing methods can hardly produce accurate editing and even lead to failures in attribute editing when the query sentence is with multiple editable attributes. To cope with this problem, by focusing on enhancing the difference between attributes, this paper proposes a novel model called Contrastive Attention Generative Adversarial Network (CA-GAN), which is inspired from contrastive training. Specifically, we first design a novel contrastive attention module to enlarge the editing difference between random combinations of attributes which are formed during training. We then construct an attribute discriminator to ensure effective editing on each attribute. A series of experiments show that our method can generate very encouraging results in sentence-based image editing with multiple attributes on CUB and COCO dataset. Our code is available at https://github.com/Zlq2021/CA-GAN

[352]  arXiv:2110.11160 [pdf, other]
Title: Self-Supervised Visual Representation Learning Using Lightweight Architectures
Comments: 8 pages, 4 figures, 1 table, submitted to Artificial Intelligence and Statistics 2022 (AISTATS 2022)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine. The objective is to transfer the trained weights to perform a downstream task in the target domain. We critically examine the most notable pretext tasks to extract features from image data and further go on to conduct experiments on resource constrained networks, which aid faster experimentation and deployment. We study the performance of various self-supervised techniques keeping all other parameters uniform. We study the patterns that emerge by varying model type, size and amount of pre-training done for the backbone as well as establish a standard to compare against for future research. We also conduct comprehensive studies to understand the quality of representations learned by different architectures.

[353]  arXiv:2110.11162 [pdf, other]
Title: Hierarchical Multi-robot Strategies Synthesis and Optimization under Individual and Collaborative Temporal Logic Specifications
Comments: 14 pages, 6 figures. arXiv admin note: text overlap with arXiv:2108.11597
Subjects: Robotics (cs.RO)

This paper presents a hierarchical framework to solve the multi-robot temporal task planning problem. We assume that each robot has its individual task specification and the robots have to jointly satisfy a global collaborative task specification, both described in linear temporal logic. Specifically, a central server firstly extracts and decomposes a collaborative task sequence from the automaton corresponding to the collaborative task specification, and allocates the subtasks in the sequence to robots. The robots can then synthesize their initial execution strategies based on locally constructed product automatons, combining the assigned collaborative tasks and their individual task specifications. Furthermore, we propose a distributed execution strategy adjusting mechanism to iteratively improve the time efficiency, by reducing wait time in collaborations caused by potential synchronization constraints. We prove the completeness of the proposed framework under assumptions, and analyze its time complexity and optimality. Extensive simulation results verify the scalability and optimization efficiency of the proposed method.

[354]  arXiv:2110.11164 [pdf, other]
Title: Modeling Performance in Open-Domain Dialogue with PARADISE
Comments: The 12th International Workshop on Spoken Dialog System Technology, November 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

There has recently been an explosion of work on spoken dialogue systems, along with an increased interest in open-domain systems that engage in casual conversations on popular topics such as movies, books and music. These systems aim to socially engage, entertain, and even empathize with their users. Since the achievement of such social goals is hard to measure, recent research has used dialogue length or human ratings as evaluation metrics, and developed methods for automatically calculating novel metrics, such as coherence, consistency, relevance and engagement. Here we develop a PARADISE model for predicting the performance of Athena, a dialogue system that has participated in thousands of conversations with real users, while competing as a finalist in the Alexa Prize. We use both user ratings and dialogue length as metrics for dialogue quality, and experiment with predicting these metrics using automatic features that are both system dependent and independent. Our goal is to learn a general objective function that can be used to optimize the dialogue choices of any Alexa Prize system in real time and evaluate its performance. Our best model for predicting user ratings gets an R$^2$ of .136 with a DistilBert model, and the best model for predicting length with system independent features gets an R$^2$ of .865, suggesting that conversation length may be a more reliable measure for automatic training of dialogue systems.

[355]  arXiv:2110.11166 [pdf, other]
Title: Driving the Herd: Search Engines as Content Influencers
Subjects: Information Retrieval (cs.IR)

In competitive search settings such as the Web, many documents' authors (publishers) opt to have their documents highly ranked for some queries. To this end, they modify the documents - specifically, their content - in response to induced rankings. Thus, the search engine affects the content in the corpus via its ranking decisions. We present a first study of the ability of search engines to drive pre-defined, targeted, content effects in the corpus using simple techniques. The first is based on the herding phenomenon - a celebrated result from the economics literature - and the second is based on biasing the relevance ranking function. The types of content effects we study are either topical or touch on specific document properties - length and inclusion of query terms. Analysis of ranking competitions we organized between incentivized publishers shows that the types of content effects we target can indeed be attained by applying our suggested techniques. These findings have important implications with regard to the role of search engines in shaping the corpus.

[356]  arXiv:2110.11167 [pdf]
Title: A guided journey through non-interactive automatic story generation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present a literature survey on non-interactive computational story generation. The article starts with the presentation of requirements for creative systems, three types of models of creativity (computational, socio-cultural, and individual), and models of human creative writing. Then it reviews each class of story generation approach depending on the used technology: story-schemas, analogy, rules, planning, evolutionary algorithms, implicit knowledge learning, and explicit knowledge learning. Before the concluding section, the article analyses the contributions of the reviewed work to improve the quality of the generated stories. This analysis addresses the description of the story characters, the use of narrative knowledge including about character believability, and the possible lack of more comprehensive or more detailed knowledge or creativity models. Finally, the article presents concluding remarks in the form of suggestions of research topics that might have a significant impact on the advancement of the state of the art on autonomous non-interactive story generation systems. The article concludes that the autonomous generation and adoption of the main idea to be conveyed and the autonomous design of the creativity ensuring criteria are possibly two of most important topics for future research.

[357]  arXiv:2110.11168 [pdf, ps, other]
Title: A Survey on Methods and Metrics for the Assessment of Explainability under the Proposed AI Act
Comments: Accepted paper at JURIX 2021
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

This study discusses the interplay between metrics used to measure the explainability of the AI systems and the proposed EU Artificial Intelligence Act. A standardisation process is ongoing: several entities (e.g. ISO) and scholars are discussing how to design systems that are compliant with the forthcoming Act and explainability metrics play a significant role. This study identifies the requirements that such a metric should possess to ease compliance with the AI Act. It does so according to an interdisciplinary approach, i.e. by departing from the philosophical concept of explainability and discussing some metrics proposed by scholars and standardisation entities through the lenses of the explainability obligations set by the proposed AI Act. Our analysis proposes that metrics to measure the kind of explainability endorsed by the proposed AI Act shall be risk-focused, model-agnostic, goal-aware, intelligible & accessible. This is why we discuss the extent to which these requirements are met by the metrics currently under discussion.

[358]  arXiv:2110.11177 [pdf, other]
Title: Decentralised Trustworthy Collaborative Intrusion Detection System for IoT
Comments: 8 pages, 7 figures, accepted to IEEE Blockchain 2021
Subjects: Cryptography and Security (cs.CR)

Intrusion Detection Systems (IDS) have been the industry standard for securing IoT networks against known attacks. To increase the capability of an IDS, researchers proposed the concept of blockchain-based Collaborative-IDS (CIDS), wherein blockchain acts as a decentralised platform allowing collaboration between CIDS nodes to share intrusion related information, such as intrusion alarms and detection rules. However, proposals in blockchain-based CIDS overlook the importance of continuous evaluation of the trustworthiness of each node and generally work based on the assumption that the nodes are always honest. In this paper, we propose a decentralised CIDS that emphasises the importance of building trust between CIDS nodes. In our proposed solution, each CIDS node exchanges detection rules to help other nodes detect new types of intrusion. Our architecture offloads the trust computation to the blockchain and utilises a decentralised storage to host the shared trustworthy detection rules, ensuring scalability. Our implementation in a lab-scale testbed shows that the our solution is feasible and performs within the expected benchmarks of the Ethereum platform.

[359]  arXiv:2110.11179 [pdf, other]
Title: A hyper-reduced MAC scheme for the parametric Stokes and Navier-Stokes equations
Subjects: Numerical Analysis (math.NA)

The need for accelerating the repeated solving of certain parametrized systems motivates the development of more efficient reduced order methods. The classical reduced basis method is popular due to an offline-online decomposition and a mathematically rigorous {\em a posterior} error estimator which guides a greedy algorithm offline. For nonlinear and nonaffine problems, hyper reduction techniques have been introduced to make this decomposition efficient. However, they may be tricky to implement and often degrade the online computation efficiency.
To avoid this degradation, reduced residual reduced over-collocation (R2-ROC) was invented integrating empirical interpolation techniques on the solution snapshots and well-chosen residuals, the collocation philosophy, and the simplicity of evaluating the hyper-reduced well-chosen residuals. In this paper, we introduce an adaptive enrichment strategy for R2-ROC rendering it capable of handling parametric fluid flow problems. Built on top of an underlying Marker and Cell (MAC) scheme, a novel hyper-reduced MAC scheme is therefore presented and tested on Stokes and Navier-Stokes equations demonstrating its high efficiency, stability and accuracy.

[360]  arXiv:2110.11181 [pdf, other]
Title: Sensing Cox Processes via Posterior Sampling and Positive Bases
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study adaptive sensing of Cox point processes, a widely used model from spatial statistics. We introduce three tasks: maximization of captured events, search for the maximum of the intensity function and learning level sets of the intensity function. We model the intensity function as a sample from a truncated Gaussian process, represented in a specially constructed positive basis. In this basis, the positivity constraint on the intensity function has a simple form. We show how an minimal description positive basis can be adapted to the covariance kernel, non-stationarity and make connections to common positive bases from prior works. Our adaptive sensing algorithms use Langevin dynamics and are based on posterior sampling (\textsc{Cox-Thompson}) and top-two posterior sampling (\textsc{Top2}) principles. With latter, the difference between samples serves as a surrogate to the uncertainty. We demonstrate the approach using examples from environmental monitoring and crime rate modeling, and compare it to the classical Bayesian experimental design approach.

[361]  arXiv:2110.11182 [pdf, other]
Title: SLURP: Side Learning Uncertainty for Regression Problems
Subjects: Computer Vision and Pattern Recognition (cs.CV)

It has become critical for deep learning algorithms to quantify their output uncertainties to satisfy reliability constraints and provide accurate results. Uncertainty estimation for regression has received less attention than classification due to the more straightforward standardized output of the latter class of tasks and their high importance. However, regression problems are encountered in a wide range of applications in computer vision. We propose SLURP, a generic approach for regression uncertainty estimation via a side learner that exploits the output and the intermediate representations generated by the main task model. We test SLURP on two critical regression tasks in computer vision: monocular depth and optical flow estimation. In addition, we conduct exhaustive benchmarks comprising transfer to different datasets and the addition of aleatoric noise. The results show that our proposal is generic and readily applicable to various regression problems and has a low computational cost with respect to existing solutions.

[362]  arXiv:2110.11187 [pdf, other]
Title: Heritability in Morphological Robot Evolution
Subjects: Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

In the field of evolutionary robotics, choosing the correct encoding is very complicated, especially when robots evolve both behaviours and morphologies at the same time. With the objective of improving our understanding of the mapping process from encodings to functional robots, we introduce the biological notion of heritability, which captures the amount of phenotypic variation caused by genotypic variation. In our analysis we measure the heritability on the first generation of robots evolved from two different encodings, a direct encoding and an indirect encoding. In addition we investigate the interplay between heritability and phenotypic diversity through the course of an entire evolutionary process. In particular, we investigate how direct and indirect genotypes can exhibit preferences for exploration or exploitation throughout the course of evolution. We observe how an exploration or exploitation tradeoff can be more easily understood by examining patterns in heritability and phenotypic diversity. In conclusion, we show how heritability can be a useful tool to better understand the relationship between genotypes and phenotypes, especially helpful when designing more complicated systems where complex individuals and environments can adapt and influence each other.

[363]  arXiv:2110.11188 [pdf, other]
Title: Classification of Encrypted IoT Traffic Despite Padding and Shaping
Comments: 13 pages, 11 figures, 7 tables
Subjects: Cryptography and Security (cs.CR)

It is well known that when IoT traffic is unencrypted it is possible to identify the active devices based on their TCP/IP headers. And when traffic is encrypted, packet-sizes and timings can still be used to do so. To defend against such fingerprinting, traffic padding and shaping were introduced. In this paper we demonstrate that the packet-sizes distribution can still be used to successfully fingerprint the active IoT devices when shaping and padding are used, as long as the adversary is aware that these mitigations are deployed, and even if the values of the padding and shaping parameters are unknown. The main tool we use in our analysis is the full distribution of packet-sizes, as opposed to commonly used statistics such as mean and variance. We further show how an external adversary who only sees the padded and shaped traffic as aggregated and hidden behind a NAT middlebox can accurately identify the subset of active devices with Recall and Precision of at least 96%. We also show that the adversary can distinguish time windows containing only bogus cover packets from windows with real device activity, at a granularity of $1sec$ time windows, with 81% accuracy. Using similar methodology, but now on the defender's side, we are also able to detect anomalous activities in IoT traffic due to the Mirai worm.

[364]  arXiv:2110.11190 [pdf, other]
Title: On Hard Episodes in Meta-Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Existing meta-learners primarily focus on improving the average task accuracy across multiple episodes. Different episodes, however, may vary in hardness and quality leading to a wide gap in the meta-learner's performance across episodes. Understanding this issue is particularly critical in industrial few-shot settings, where there is limited control over test episodes as they are typically uploaded by end-users. In this paper, we empirically analyse the behaviour of meta-learners on episodes of varying hardness across three standard benchmark datasets: CIFAR-FS, mini-ImageNet, and tiered-ImageNet. Surprisingly, we observe a wide gap in accuracy of around 50% between the hardest and easiest episodes across all the standard benchmarks and meta-learners. We additionally investigate various properties of hard episodes and highlight their connection to catastrophic forgetting during meta-training. To address the issue of sub-par performance on hard episodes, we investigate and benchmark different meta-training strategies based on adversarial training and curriculum learning. We find that adversarial training strategies are much more powerful than curriculum learning in improving the prediction performance on hard episodes.

[365]  arXiv:2110.11191 [pdf, other]
Title: Generative Adversarial Graph Convolutional Networks for Human Action Synthesis
Comments: Published as a conference paper at WACV 2022. Code and pretrained models available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Synthesising the spatial and temporal dynamics of the human body skeleton remains a challenging task, not only in terms of the quality of the generated shapes, but also of their diversity, particularly to synthesise realistic body movements of a specific action (action conditioning). In this paper, we propose Kinetic-GAN, a novel architecture that leverages the benefits of Generative Adversarial Networks and Graph Convolutional Networks to synthesise the kinetics of the human body. The proposed adversarial architecture can condition up to 120 different actions over local and global body movements while improving sample quality and diversity through latent space disentanglement and stochastic variations. Our experiments were carried out in three well-known datasets, where Kinetic-GAN notably surpasses the state-of-the-art methods in terms of distribution quality metrics while having the ability to synthesise more than one order of magnitude regarding the number of different actions. Our code and models are publicly available at https://github.com/DegardinBruno/Kinetic-GAN.

[366]  arXiv:2110.11198 [pdf, other]
Title: Temporal Motifs in Patent Opposition and Collaboration Networks
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

Patents are intellectual properties that reflect innovative activities of companies and organizations. The literature is rich with the studies that analyze the citations among the patents and the collaboration relations among companies that own the patents. However, the adversarial relations between the patent owners are not as well investigated. One proxy to model such relations is the patent opposition, which is a legal activity in which a company challenges the validity of a patent. Characterizing the patent oppositions, collaborations, and the interplay between them can help better understand the companies' business strategies. Temporality matters in this context as the order and frequency of oppositions and collaborations characterize their interplay. In this study, we construct a two-layer temporal network to model the patent oppositions and collaborations among the companies. We utilize temporal motifs to analyze the oppositions and collaborations from structural and temporal perspectives. We first characterize the frequent motifs in patent oppositions and investigate how often the companies of different sizes attack other companies. We show that large companies tend to engage in opposition with multiple companies. Then we analyze the temporal interplay between collaborations and oppositions. We find that two adversarial companies are more likely to collaborate in the future than two collaborating companies oppose each other in the future.

[367]  arXiv:2110.11199 [pdf, other]
Title: Asynchronous Decentralized Distributed Training of Acoustic Models
Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing
Subjects: Computation and Language (cs.CL)

Large-scale distributed training of deep acoustic models plays an important role in today's high-performance automatic speech recognition (ASR). In this paper we investigate a variety of asynchronous decentralized distributed training strategies based on data parallel stochastic gradient descent (SGD) to show their superior performance over the commonly-used synchronous distributed training via allreduce, especially when dealing with large batch sizes. Specifically, we study three variants of asynchronous decentralized parallel SGD (ADPSGD), namely, fixed and randomized communication patterns on a ring as well as a delay-by-one scheme. We introduce a mathematical model of ADPSGD, give its theoretical convergence rate, and compare the empirical convergence behavior and straggler resilience properties of the three variants. Experiments are carried out on an IBM supercomputer for training deep long short-term memory (LSTM) acoustic models on the 2000-hour Switchboard dataset. Recognition and speedup performance of the proposed strategies are evaluated under various training configurations. We show that ADPSGD with fixed and randomized communication patterns cope well with slow learners. When learners are equally fast, ADPSGD with the delay-by-one strategy has the fastest convergence with large batches. In particular, using the delay-by-one strategy, we can train the acoustic model in less than 2 hours using 128 V100 GPUs with competitive word error rates.

[368]  arXiv:2110.11202 [pdf, other]
Title: Anti-Concentrated Confidence Bonuses for Scalable Exploration
Subjects: Machine Learning (cs.LG)

Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning. The LinUCB algorithm, a centerpiece of the stochastic linear bandits literature, prescribes an elliptical bonus which addresses the challenge of leveraging shared information in large action spaces. This bonus scheme cannot be directly transferred to high-dimensional exploration problems, however, due to the computational cost of maintaining the inverse covariance matrix of action features. We introduce \emph{anti-concentrated confidence bounds} for efficiently approximating the elliptical bonus, using an ensemble of regressors trained to predict random noise from policy network-derived features. Using this approximation, we obtain stochastic linear bandit algorithms which obtain $\tilde O(d \sqrt{T})$ regret bounds for $\mathrm{poly}(d)$ fixed actions. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic reward heuristics on Atari benchmarks.

[369]  arXiv:2110.11205 [pdf, other]
Title: DAIR: Data Augmented Invariant Regularization
Comments: 15 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

While deep learning through empirical risk minimization (ERM) has succeeded at achieving human-level performance at a variety of complex tasks, ERM generalizes poorly to distribution shift. This is partly explained by overfitting to spurious features such as background in images or named entities in natural language. Synthetic data augmentation followed by empirical risk minimization (DA-ERM) is a simple yet powerful solution to remedy this problem. In this paper, we propose data augmented invariant regularization (DAIR). The idea of DAIR is based on the observation that the model performance (loss) is desired to be consistent on the augmented sample and the original one. DAIR introduces a regularizer on DA-ERM to penalize such loss inconsistency. Both theoretically and through empirical experiments, we show that a particular form of the DAIR regularizer consistently performs well in a variety of settings. We apply it to multiple real-world learning problems involving domain shift, namely robust regression, visual question answering, robust deep neural network training, and task-oriented dialog modeling. Our experiments show that DAIR consistently outperforms ERM and DA-ERM with little marginal cost and setting new state-of-the-art results in several benchmarks.

[370]  arXiv:2110.11207 [pdf, other]
Title: Topic-Guided Abstractive Multi-Document Summarization
Authors: Peng Cui, Le Hu
Comments: accepted at findings of EMNLP 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

A critical point of multi-document summarization (MDS) is to learn the relations among various documents. In this paper, we propose a novel abstractive MDS model, in which we represent multiple documents as a heterogeneous graph, taking semantic nodes of different granularities into account, and then apply a graph-to-sequence framework to generate summaries. Moreover, we employ a neural topic model to jointly discover latent topics that can act as cross-document semantic units to bridge different documents and provide global information to guide the summary generation. Since topic extraction can be viewed as a special type of summarization that "summarizes" texts into a more abstract format, i.e., a topic distribution, we adopt a multi-task learning strategy to jointly train the topic and summarization module, allowing the promotion of each other. Experimental results on the Multi-News dataset demonstrate that our model outperforms previous state-of-the-art MDS models on both Rouge metrics and human evaluation, meanwhile learns high-quality topics.

[371]  arXiv:2110.11208 [pdf, ps, other]
Title: User-Level Private Learning via Correlated Sampling
Comments: To appear in NeurIPS 2021
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)

Most works in learning with differential privacy (DP) have focused on the setting where each user has a single sample. In this work, we consider the setting where each user holds $m$ samples and the privacy protection is enforced at the level of each user's data. We show that, in this setting, we may learn with a much fewer number of users. Specifically, we show that, as long as each user receives sufficiently many samples, we can learn any privately learnable class via an $(\epsilon, \delta)$-DP algorithm using only $O(\log(1/\delta)/\epsilon)$ users. For $\epsilon$-DP algorithms, we show that we can learn using only $O_{\epsilon}(d)$ users even in the local model, where $d$ is the probabilistic representation dimension. In both cases, we show a nearly-matching lower bound on the number of users required.
A crucial component of our results is a generalization of global stability [Bun et al., FOCS 2020] that allows the use of public randomness. Under this relaxed notion, we employ a correlated sampling strategy to show that the global stability can be boosted to be arbitrarily close to one, at a polynomial expense in the number of samples.

[372]  arXiv:2110.11211 [pdf, other]
Title: A dimension-oblivious domain decomposition method based on space-filling curves
Comments: 24 pages, 9 figures, 1 table. arXiv admin note: substantial text overlap with arXiv:2103.03315
Subjects: Numerical Analysis (math.NA)

In this paper we present an algebraic dimension-oblivious two-level domain decomposition solver for discretizations of elliptic partial differential equations. The proposed parallel solver is based on a space-filling curve partitioning approach that is applicable to any discretization, i.e. it directly operates on the assembled matrix equations. Moreover, it allows for the effective use of arbitrary processor numbers independent of the dimension of the underlying partial differential equation while maintaining optimal convergence behavior. This is the core property required to attain a sparse grid based combination method with extreme scalability which can utilize exascale parallel systems efficiently. Moreover, this approach provides a basis for the development of a fault-tolerant solver for the numerical treatment of high-dimensional problems. To achieve the required data redundancy we are therefore concerned with large overlaps of our domain decomposition which we construct via space-filling curves. In this paper, we propose our space-filling curve based domain decomposition solver and present its convergence properties and scaling behavior. The results of numerical experiments clearly show that our approach provides optimal convergence and scaling behavior in arbitrary dimension utilizing arbitrary processor numbers.

[373]  arXiv:2110.11219 [pdf, other]
Title: PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image
Comments: accepted to BMVC 2021, code opensource: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios. Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures but overlooked the dual characteristics of piece-wise planes as objects and geometric models. Different from other existing approaches, we start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet, which integrates a single-stage instance segmentation network for piece-wise planar segmentation and a depth decoder to reconstruct the scene from a single RGB image. To achieve this, we introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation. Meanwhile, a novel Plane Prior Attention module is used to guide depth estimation with the awareness of plane instances. Exhaustive experiments are conducted in this work to validate the effectiveness and efficiency of our method.

[374]  arXiv:2110.11222 [pdf, other]
Title: Is High Variance Unavoidable in RL? A Case Study in Continuous Control
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reinforcement learning (RL) experiments have notoriously high variance, and minor details can have disproportionately large effects on measured outcomes. This is problematic for creating reproducible research and also serves as an obstacle for real-world applications, where safety and predictability are paramount. In this paper, we investigate causes for this perceived instability. To allow for an in-depth analysis, we focus on a specifically popular setup with high variance -- continuous control from pixels with an actor-critic agent. In this setting, we demonstrate that variance mostly arises early in training as a result of poor "outlier" runs, but that weight initialization and initial exploration are not to blame. We show that one cause for early variance is numerical instability which leads to saturating nonlinearities. We investigate several fixes to this issue and find that one particular method is surprisingly effective and simple -- normalizing penultimate features. Addressing the learning instability allows for larger learning rates, and significantly decreases the variance of outcomes. This demonstrates that the perceived variance in RL is not necessarily inherent to the problem definition and may be addressed through simple architectural modifications.

[375]  arXiv:2110.11223 [pdf, other]
Title: Detection of Driver Drowsiness by Calculating the Speed of Eye Blinking
Comments: This paper has been accepted at the Upper-Rhine Artificial Intelligence Symposium 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Many road accidents are caused by drowsiness of the driver. While there are methods to detect closed eyes, it is a non-trivial task to detect the gradual process of a driver becoming drowsy. We consider a simple real-time detection system for drowsiness merely based on the eye blinking rate derived from the eye aspect ratio. For the eye detection we use HOG and a linear SVM. If the speed of the eye blinking drops below some empirically determined threshold, the system triggers an alarm, hence preventing the driver from falling into microsleep. In this paper, we extensively evaluate the minimal requirements for the proposed system. We find that this system works well if the face is directed to the camera, but it becomes less reliable once the head is tilted significantly. The results of our evaluations provide the foundation for further developments of our drowsiness detection system.

[376]  arXiv:2110.11225 [pdf, other]
Title: Player Dominance Adjustment in Games
Authors: Junjie Xu
Comments: 65 pages
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

Video Games are boring when they are too easy, and frustrating when they are too hard. In terms of providing game experience such as enjoyment to the player by match players with different levels of ability to player ability, We assume that implementing DDA for providing matches between player ability and overall game difficulty to the game, especially the modern game, has limitations in terms of increasing computational cost and complexities in the design of modeling the difficulty in modern games. To overcome limitations underlying the method of providing static difficulty changes to player, and DDA, we proposed a novel idea, Player Domination adjustment (PDA). The proposed idea is that to control the AI's actions based on the player's inputs so as to adjust the player's dominant power (e.g. the AI recognizes the player's attack actions but defends it in a wrong side to let the player incur damage to itself), which was proved as it leads to promotion of game-related self-efficacy in our work. Several pieces of research on were conducted on a social deduction game and a fighting game respectively, show our proposed idea has its potential of promoting User Experience(UX). As in an another study, outperforms DDA in two conducted experiments in terms of health promotion.

[377]  arXiv:2110.11226 [pdf, other]
Title: Accelerating Genetic Programming using GPUs
Authors: Vimarsh Sathia (1), Venkataramana Ganesh (2), Shankara Rao Thejaswi Nanditale (2) ((1) Indian Institute of Technology Madras, (2) NVIDIA Corporation)
Comments: 10 pages, 4 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Genetic Programming (GP), an evolutionary learning technique, has multiple applications in machine learning such as curve fitting, data modelling, feature selection, classification etc. GP has several inherent parallel steps, making it an ideal candidate for GPU based parallelization. This paper describes a GPU accelerated stack-based variant of the generational GP algorithm which can be used for symbolic regression and binary classification. The selection and evaluation steps of the generational GP algorithm are parallelized using CUDA. We introduce representing candidate solution expressions as prefix lists, which enables evaluation using a fixed-length stack in GPU memory. CUDA based matrix vector operations are also used for computation of the fitness of population programs. We evaluate our algorithm on synthetic datasets for the Pagie Polynomial (ranging in size from $4096$ to $16$ million points), profiling training times of our algorithm with other standard symbolic regression libraries viz. gplearn, TensorGP and KarooGP. In addition, using $6$ large-scale regression and classification datasets usually used for comparing gradient boosting algorithms, we run performance benchmarks on our algorithm and gplearn, profiling the training time, test accuracy, and loss. On an NVIDIA DGX-A100 GPU, our algorithm outperforms all the previously listed frameworks, and in particular, achieves average speedups of $119\times$ and $40\times$ against gplearn on the synthetic and large scale datasets respectively.

[378]  arXiv:2110.11227 [pdf, other]
Title: Towards Automatic Grading of D3.js Visualizations
Comments: Accepted to IEEE VIS'21. For a demo video, see this https URL
Subjects: Human-Computer Interaction (cs.HC)

Manually grading D3 data visualizations is a challenging endeavor, and is especially difficult for large classes with hundreds of students. Grading an interactive visualization requires a combination of interactive, quantitative, and qualitative evaluation that are conventionally done manually and are difficult to scale up as the visualization complexity, data size, and number of students increase. We present a first-of-its kind automatic grading method for D3 visualizations that scalably and precisely evaluates the data bindings, visual encodings, interactions, and design specifications used in a visualization. Our method has shown potential to enhance students' learning experience, enabling them to submit their code frequently and receive rapid feedback to better inform iteration and improvement to their code and visualization design. Our method promotes consistent grading and enables instructors to dedicate more focus to assist students in gaining visualization knowledge and experience. We have successfully deployed our method and auto-graded D3 submissions from more than 1000 undergraduate and graduate students in Georgia Tech's CSE6242 Data and Visual Analytics course, and received positive feedback and encouragement for expanding its adoption.

[379]  arXiv:2110.11236 [pdf, other]
Title: Variational Predictive Routing with Nested Subjective Timescales
Comments: 18 pages, 13 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

Discovery and learning of an underlying spatiotemporal hierarchy in sequential data is an important topic for machine learning. Despite this, little work has been done to explore hierarchical generative models that can flexibly adapt their layerwise representations in response to datasets with different temporal dynamics. Here, we present Variational Predictive Routing (VPR) - a neural probabilistic inference system that organizes latent representations of video features in a temporal hierarchy, based on their rates of change, thus modeling continuous data as a hierarchical renewal process. By employing an event detection mechanism that relies solely on the system's latent representations (without the need of a separate model), VPR is able to dynamically adjust its internal state following changes in the observed features, promoting an optimal organisation of representations across the levels of the model's latent hierarchy. Using several video datasets, we show that VPR is able to detect event boundaries, disentangle spatiotemporal features across its hierarchy, adapt to the dynamics of the data, and produce accurate time-agnostic rollouts of the future. Our approach integrates insights from neuroscience and introduces a framework with high potential for applications in model-based reinforcement learning, where flexible and informative state-space rollouts are of particular interest.

[380]  arXiv:2110.11238 [pdf, other]
Title: One Representative-Shot Learning Using a Population-Driven Template with Application to Brain Connectivity Classification and Evolution Prediction
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Few-shot learning presents a challenging paradigm for training discriminative models on a few training samples representing the target classes to discriminate. However, classification methods based on deep learning are ill-suited for such learning as they need large amounts of training data --let alone one-shot learning. Recently, graph neural networks (GNNs) have been introduced to the field of network neuroscience, where the brain connectivity is encoded in a graph. However, with scarce neuroimaging datasets particularly for rare diseases and low-resource clinical facilities, such data-devouring architectures might fail in learning the target task. In this paper, we take a very different approach in training GNNs, where we aim to learn with one sample and achieve the best performance --a formidable challenge to tackle. Specifically, we present the first one-shot paradigm where a GNN is trained on a single population-driven template --namely a connectional brain template (CBT). A CBT is a compact representation of a population of brain graphs capturing the unique connectivity patterns shared across individuals. It is analogous to brain image atlases for neuroimaging datasets. Using a one-representative CBT as a training sample, we alleviate the training load of GNN models while boosting their performance across a variety of classification and regression tasks. We demonstrate that our method significantly outperformed benchmark one-shot learning methods with downstream classification and time-dependent brain graph data forecasting tasks while competing with the train-on-all conventional training strategy. Our source code can be found at https://github.com/basiralab/one-representative-shot-learning.

[381]  arXiv:2110.11239 [pdf, other]
Title: Improving the Search by Encoding Multiple Solutions in a Chromosome
Authors: Mihai Oltean
Comments: 7 figures
Journal-ref: Evolutionary Machine Design, Nova Science Publisher, New-York, edited by Nadia Nedjah (et al.), pp. 81-107, 2004
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

We investigate the possibility of encoding multiple solutions of a problem in a single chromosome. The best solution encoded in an individual will represent (will provide the fitness of) that individual. In order to obtain some benefits the chromosome decoding process must have the same complexity as in the case of a single solution in a chromosome. Three Genetic Programming techniques are analyzed for this purpose: Multi Expression Programming, Linear Genetic Programming, and Infix Form Genetic Programming. Numerical experiments show that encoding multiple solutions in a chromosome greatly improves the search process.

[382]  arXiv:2110.11240 [pdf, other]
Title: A Systematic Review on the Detection of Fake News Articles
Comments: 22 Pages, 16 Figures, Currently submitted to ACM TIST - Awaiting Peer-Review
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

It has been argued that fake news and the spread of false information pose a threat to societies throughout the world, from influencing the results of elections to hindering the efforts to manage the COVID-19 pandemic. To combat this threat, a number of Natural Language Processing (NLP) approaches have been developed. These leverage a number of datasets, feature extraction/selection techniques and machine learning (ML) algorithms to detect fake news before it spreads. While these methods are well-documented, there is less evidence regarding their efficacy in this domain. By systematically reviewing the literature, this paper aims to delineate the approaches for fake news detection that are most performant, identify limitations with existing approaches, and suggest ways these can be mitigated. The analysis of the results indicates that Ensemble Methods using a combination of news content and socially-based features are currently the most effective. Finally, it is proposed that future research should focus on developing approaches that address generalisability issues (which, in part, arise from limitations with current datasets), explainability and bias.

[383]  arXiv:2110.11241 [pdf, other]
Title: Be Daring to Push your Ads Forward: Measuring the (Over)use of Service Workers for Advertising Purposes
Subjects: Networking and Internet Architecture (cs.NI)

Rich offline experience, periodic background sync, push notification functionality, network requests control, improved performance via requests caching are only a few of the functionalities provided by the Service Workers API. This new technology, supported by all major browsers, can significantly improve users' experience by providing the publisher with the technical foundations that would normally require anative application. Albeit the capabilities of this new technique and its important role in the ecosystem of Progressive Web Apps (PWAs), it is still unclear what is their actual purpose on the web, and how publishers leverage the provided functionality in their web applications. In this study, we shed light in the real world deployment of Service Workers, by conducting the first large scale analysis of the prevalence of Service Workers in the wild. We see that Service Workers are becoming more and more popular, with the adoption increased by 26% only within the last 5 months. Surprisingly, besides their fruitful capabilities, we see that Service Workers are being mostly used for Push Advertising, in 65.08% of the Service Workers that connect with 3rd parties. We Highlight that this is a relatively new way for advertisers to bypass ad-blockers and render ads on the user's displays natively.

[384]  arXiv:2110.11242 [pdf]
Title: Analysis of the first Genetic Engineering Attribution Challenge
Comments: Main text: 11 pages, 4 figures, 37 references. Supplementary materials: 29 pages, 2 supplementary tables, 21 supplementary figures
Subjects: Neural and Evolutionary Computing (cs.NE)

The ability to identify the designer of engineered biological sequences -- termed genetic engineering attribution (GEA) -- would help ensure due credit for biotechnological innovation, while holding designers accountable to the communities they affect. Here, we present the results of the first Genetic Engineering Attribution Challenge, a public data-science competition to advance GEA. Top-scoring teams dramatically outperformed previous models at identifying the true lab-of-origin of engineered sequences, including an increase in top-1 and top-10 accuracy of 10 percentage points. A simple ensemble of prizewinning models further increased performance. New metrics, designed to assess a model's ability to confidently exclude candidate labs, also showed major improvements, especially for the ensemble. Most winning teams adopted CNN-based machine-learning approaches; however, one team achieved very high accuracy with an extremely fast neural-network-free approach. Future work, including future competitions, should further explore a wide diversity of approaches for bringing GEA technology into practical use.

[385]  arXiv:2110.11246 [pdf, other]
Title: Motion Planning for Connected Automated Vehicles at Occluded Intersections With Infrastructure Sensors
Comments: 12 pages, 8 figures
Subjects: Robotics (cs.RO)

Motion planning at urban intersections that accounts for the situation context, handles occlusions, and deals with measurement and prediction uncertainty is a major challenge on the way to urban automated driving. In this work, we address this challenge with a sampling-based optimization approach. For this, we formulate an optimal control problem that optimizes for low risk and high passenger comfort. The risk is calculated on the basis of the perception information and the respective uncertainty using a risk model. The risk model combines set-based methods and probabilistic approaches. Thus, the approach provides safety guarantees in a probabilistic sense, while for a vanishing risk, the formal safety guarantees of the set-based methods are inherited. By exploring all available behavior options, our approach solves decision making and longitudinal trajectory planning in one step. The available behavior options are provided by a formal representation of the situation context, which is also used to reduce calculation efforts. Occlusions are resolved using the external perception of infrastructure-mounted sensors. Yet, instead of merging external and ego perception with track-to-track fusion, the information is used in parallel. The motion planning scheme is validated through real-world experiments.

[386]  arXiv:2110.11248 [pdf, other]
Title: Learning to Recommend Using Non-Uniform Data
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Learning user preferences for products based on their past purchases or reviews is at the cornerstone of modern recommendation engines. One complication in this learning task is that some users are more likely to purchase products or review them, and some products are more likely to be purchased or reviewed by the users. This non-uniform pattern degrades the power of many existing recommendation algorithms, as they assume that the observed data is sampled uniformly at random among user-product pairs. In addition, existing literature on modeling non-uniformity either assume user interests are independent of the products, or lack theoretical understanding. In this paper, we first model the user-product preferences as a partially observed matrix with non-uniform observation pattern. Next, building on the literature about low-rank matrix estimation, we introduce a new weighted trace-norm penalized regression to predict unobserved values of the matrix. We then prove an upper bound for the prediction error of our proposed approach. Our upper bound is a function of a number of parameters that are based on a certain weight matrix that depends on the joint distribution of users and products. Utilizing this observation, we introduce a new optimization problem to select a weight matrix that minimizes the upper bound on the prediction error. The final product is a new estimator, NU-Recommend, that outperforms existing methods in both synthetic and real datasets.

[387]  arXiv:2110.11253 [pdf, ps, other]
Title: Multimode Diagnosis for Switched Affine Systems with Noisy Measurement
Comments: 24 pages, 8 figures
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

We study a diagnosis scheme to reliably detect the active mode of discrete-time, switched affine systems in the presence of measurement noise and asynchronous switching. The proposed scheme consists of two parts: (i) the construction of a bank of filters, and (ii) the introduction of a residual/threshold-based diagnosis rule. We develop an exact finite optimization-based framework to numerically solve an optimal bank of filters in which the contribution of the measurement noise to the residual is minimized. The design problem is safely approximated through linear matrix inequalities and thus becomes tractable. We further propose a thresholding policy along with probabilistic false-alarm guarantees to estimate the active system mode in real-time. In comparison with the existing results, the guarantees improve from a polynomial dependency in the probability of false-alarm to a logarithmic form. This improvement is achieved under the additional assumption of sub-Gaussianity, which is expected in many applications. The performance of the proposed diagnosis filters is validated through a synthesis numerical example and an application of the building radiant system.

[388]  arXiv:2110.11255 [pdf, other]
Title: On the properties of some low-parameter models for color reproduction in terms of spectrum transformations and coverage of a color triangle
Comments: 23 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

One of the classical approaches to solving color reproduction problems, such as color adaptation or color space transform, is the use of low-parameter spectral models. The strength of this approach is the ability to choose a set of properties that the model should have, be it a large coverage area of a color triangle, an accurate description of the addition or multiplication of spectra, knowing only the tristimulus corresponding to them. The disadvantage is that some of the properties of the mentioned spectral models are confirmed only experimentally. This work is devoted to the theoretical substantiation of various properties of spectral models. In particular, we prove that the banded model is the only model that simultaneously possesses the properties of closure under addition and multiplication. We also show that the Gaussian model is the limiting case of the von Mises model and prove that the set of protomers of the von Mises model unambiguously covers the color triangle in both the case of convex and non-convex spectral locus.

[389]  arXiv:2110.11256 [pdf, other]
Title: Multi-Category Mesh Reconstruction From Image Collections
Comments: Accepted at 3DV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, learning frameworks have shown the capability of inferring the accurate shape, pose, and texture of an object from a single RGB image. However, current methods are trained on image collections of a single category in order to exploit specific priors, and they often make use of category-specific 3D templates. In this paper, we present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture. Differently from previous works, our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision. Without specific 3D templates, the framework learns category-level models which are deformed to recover the 3D shape of the depicted object. The instance-specific deformations are predicted independently for each vertex of the learned 3D mesh, enabling the dynamic subdivision of the mesh during the training process. Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner. Predicted shapes are smooth and can leverage from multiple steps of subdivision during the training process, obtaining comparable or state-of-the-art results on two public datasets. Models and code are publicly released.

[390]  arXiv:2110.11259 [pdf, other]
Title: A scale invariant ranking function for learning-to-rank: a real-world use case
Comments: 14 pages, 1 figure, 1 table
Subjects: Information Retrieval (cs.IR)

Nowadays, Online Travel Agencies provide the main service for booking holidays, business trips, accommodations, etc. As in many e-commerce services where users, items, and preferences are involved, the use of a Recommender System facilitates the navigation of the marketplaces. One of the main challenges when productizing machine learning models (and in this case, Learning-to-Rank models) is the need of, not only consistent pre-processing transformations, but also input features maintaining a similar scale both at training and prediction time. However, the features' scale does not necessarily stay the same in the real-world production environment, which could lead to unexpected ranking order. Normalization techniques such as feature standardization, batch normalization and layer normalization are commonly used to tackle the scaling issue. However, these techniques. To address this issue, in this paper we propose a novel scale-invariant ranking function (dubbed as SIR) which is accomplished by combining a deep and a wide neural network. We incorporate SIR with five state-of-the-art Learning-to-Rank models and compare the performance of the combined models with the classic algorithms on a large data set containing 56 million booked searches from the Hotels.com website. Besides, we simulate four real-world scenarios where the features' scale at the test set is inconsistent with that at the training set. The results reveal that when the features' scale is inconsistent at prediction time, Learning-To-Rank methods incorporating SIR outperform their original counterpart in all scenarios (with performance difference up to 14.7%), while when the features' scale at the training and test set are consistent our proposal achieves comparable accuracy to the classic algorithms.

[391]  arXiv:2110.11261 [pdf, other]
Title: Principal Component Analysis versus Factor Analysis
Comments: 54 pages, 13 figures, 35 tables
Journal-ref: Zeszyty Naukowe WWSI, No 24, Vol. 15, 2021, pp. 35-88
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

The article discusses selected problems related to both principal component analysis (PCA) and factor analysis (FA). In particular, both types of analysis were compared. A vector interpretation for both PCA and FA has also been proposed. The problem of determining the number of principal components in PCA and factors in FA was discussed in detail. A new criterion for determining the number of factors and principal components is discussed, which will allow to present most of the variance of each of the analyzed primary variables. An efficient algorithm for determining the number of factors in FA, which complies with this criterion, was also proposed. This algorithm was adapted to find the number of principal components in PCA. It was also proposed to modify the PCA algorithm using a new method of determining the number of principal components. The obtained results were discussed.

[392]  arXiv:2110.11262 [pdf, other]
Title: Detecting Important Patterns Using Conceptual Relevance Interestingness Measure
Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

Discovering meaningful conceptual structures is a substantial task in data mining and knowledge discovery applications. While off-the-shelf interestingness indices defined in Formal Concept Analysis may provide an effective relevance evaluation in several situations, they frequently give inadequate results when faced with massive formal contexts (and concept lattices), and in the presence of irrelevant concepts. In this paper, we introduce the Conceptual Relevance (CR) score, a new scalable interestingness measurement for the identification of actionable concepts. From a conceptual perspective, the minimal generators provide key information about their associated concept intent. Furthermore, the relevant attributes of a concept are those that maintain the satisfaction of its closure condition. Thus, the guiding idea of CR exploits the fact that minimal generators and relevant attributes can be efficiently used to assess concept relevance. As such, the CR index quantifies both the amount of conceptually relevant attributes and the number of the minimal generators per concept intent. Our experiments on synthetic and real-world datasets show the efficiency of this measure over the well-known stability index.

[393]  arXiv:2110.11264 [pdf, ps, other]
Title: MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality. Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space, which ignore the single space of each modality in the shallow layers. To solve it, in this paper, we present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space. Firstly, based on the observation that edge information is modality-invariant, we propose an edge features enhancement module to enhance the modality-sharable features in each single-modality space. Specifically, we design a perceptual edge features (PEF) loss after the edge fusion strategy analysis. According to our knowledge, this is the first work that proposes explicit optimization in the single-modality feature space on cross-modality ReID task. Moreover, to increase the difference between cross-modality distance and class distance, we introduce a novel cross-modality contrastive-center (CMCC) loss into the modality-joint constraints in the common feature space. The PEF loss and CMCC loss jointly optimize the model in an end-to-end manner, which markedly improves the network's performance. Extensive experiments demonstrate that the proposed model significantly outperforms state-of-the-art methods on both the SYSU-MM01 and RegDB datasets.

[394]  arXiv:2110.11265 [pdf, other]
Title: Deep Reinforcement Learning for Online Control of Stochastic Partial Differential Equations
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS)

In many areas, such as the physical sciences, life sciences, and finance, control approaches are used to achieve a desired goal in complex dynamical systems governed by differential equations. In this work we formulate the problem of controlling stochastic partial differential equations (SPDE) as a reinforcement learning problem. We present a learning-based, distributed control approach for online control of a system of SPDEs with high dimensional state-action space using deep deterministic policy gradient method. We tested the performance of our method on the problem of controlling the stochastic Burgers' equation, describing a turbulent fluid flow in an infinitely large domain.

[395]  arXiv:2110.11269 [pdf, other]
Title: Modeling the AC Power Flow Equations with Optimally Compact Neural Networks: Application to Unit Commitment
Comments: first two authors equally contributed, 8 pages, 3 figures, 1 table
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)

Nonlinear power flow constraints render a variety of power system optimization problems computationally intractable. Emerging research shows, however, that the nonlinear AC power flow equations can be successfully modeled using Neural Networks (NNs). These NNs can be exactly transformed into Mixed Integer Linear Programs (MILPs) and embedded inside challenging optimization problems, thus replacing nonlinearities that are intractable for many applications with tractable piecewise linear approximations. Such approaches, though, suffer from an explosion of the number of binary variables needed to represent the NN. Accordingly, this paper develops a technique for training an "optimally compact" NN, i.e., one that can represent the power flow equations with a sufficiently high degree of accuracy while still maintaining a tractable number of binary variables. We show that the resulting NN model is more expressive than both the DC and linearized power flow approximations when embedded inside of a challenging optimization problem (i.e., the AC unit commitment problem).

[396]  arXiv:2110.11271 [pdf, other]
Title: Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance. However, such observations have never been made formal or quantitative. In fact, it is not even clear whether the difficulties arising from a poorly chosen noise distribution are statistical or algorithmic in nature. In this work, we formally pinpoint reasons for NCE's poor performance when an inappropriate noise distribution is used. Namely, we prove these challenges arise due to an ill-behaved (more precisely, flat) loss landscape. To address this, we introduce a variant of NCE called "eNCE" which uses an exponential loss and for which normalized gradient descent addresses the landscape issues provably when the target and noise distributions are in a given exponential family.

[397]  arXiv:2110.11275 [pdf, other]
Title: Self-Supervised Monocular Scene Decomposition and Depth Estimation
Comments: 3DV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Self-supervised monocular depth estimation approaches either ignore independently moving objects in the scene or need a separate segmentation step to identify them. We propose MonoDepthSeg to jointly estimate depth and segment moving objects from monocular video without using any ground-truth labels. We decompose the scene into a fixed number of components where each component corresponds to a region on the image with its own transformation matrix representing its motion. We estimate both the mask and the motion of each component efficiently with a shared encoder. We evaluate our method on three driving datasets and show that our model clearly improves depth estimation while decomposing the scene into separately moving components.

[398]  arXiv:2110.11280 [pdf, ps, other]
Title: Actor-critic is implicitly biased towards high entropy optimal policies
Subjects: Machine Learning (cs.LG)

We show that the simplest actor-critic method -- a linear softmax policy updated with TD through interaction with a linear MDP, but featuring no explicit regularization or exploration -- does not merely find an optimal policy, but moreover prefers high entropy optimal policies. To demonstrate the strength of this bias, the algorithm not only has no regularization, no projections, and no exploration like $\epsilon$-greedy, but is moreover trained on a single trajectory with no resets. The key consequence of the high entropy bias is that uniform mixing assumptions on the MDP, which exist in some form in all prior work, can be dropped: the implicit regularization of the high entropy bias is enough to ensure that all chains mix and an optimal policy is reached with high probability. As auxiliary contributions, this work decouples concerns between the actor and critic by writing the actor update as an explicit mirror descent, provides tools to uniformly bound mixing times within KL balls of policy space, and provides a projection-free TD analysis with its own implicit bias which can be run from an unmixed starting distribution.

[399]  arXiv:2110.11281 [pdf, other]
Title: Super-resolution of multiphase materials by combining complementary 2D and 3D image data using generative adversarial networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Modelling the impact of a material's mesostructure on device level performance typically requires access to 3D image data containing all the relevant information to define the geometry of the simulation domain. This image data must include sufficient contrast between phases to distinguish each material, be of high enough resolution to capture the key details, but also have a large enough field-of-view to be representative of the material in general. It is rarely possible to obtain data with all of these properties from a single imaging technique. In this paper, we present a method for combining information from pairs of distinct but complementary imaging techniques in order to accurately reconstruct the desired multi-phase, high resolution, representative, 3D images. Specifically, we use deep convolutional generative adversarial networks to implement super-resolution, style transfer and dimensionality expansion. To demonstrate the widespread applicability of this tool, two pairs of datasets are used to validate the quality of the volumes generated by fusing the information from paired imaging techniques. Three key mesostructural metrics are calculated in each case to show the accuracy of this method. Having confidence in the accuracy of our method, we then demonstrate its power by applying to a real data pair from a lithium ion battery electrode, where the required 3D high resolution image data is not available anywhere in the literature. We believe this approach is superior to previously reported statistical material reconstruction methods both in terms of its fidelity and ease of use. Furthermore, much of the data required to train this algorithm already exists in the literature, waiting to be combined. As such, our open-access code could precipitate a step change by generating the hard to obtain high quality image volumes necessary to simulate behaviour at the mesoscale.

[400]  arXiv:2110.11283 [pdf, other]
Title: The Effect of Wearing a Face Mask on Face Image Quality
Comments: 8 pages, 6 figures, 16th {IEEE} International Conference on Automatic Face and Gesture Recognition, {FG} 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Due to the COVID-19 situation, face masks have become a main part of our daily life. Wearing mouth-and-nose protection has been made a mandate in many public places, to prevent the spread of the COVID-19 virus. However, face masks affect the performance of face recognition, since a large area of the face is covered. The effect of wearing a face mask on the different components of the face recognition system in a collaborative environment is a problem that is still to be fully studied. This work studies, for the first time, the effect of wearing a face mask on face image quality by utilising state-of-the-art face image quality assessment methods of different natures. This aims at providing better understanding on the effect of face masks on the operation of face recognition as a whole system. In addition, we further studied the effect of simulated masks on face image utility in comparison to real face masks. We discuss the correlation between the mask effect on face image quality and that on the face verification performance by automatic systems and human experts, indicating a consistent trend between both factors. The evaluation is conducted on the database containing (1) no-masked faces, (2) real face masks, and (3) simulated face masks, by synthetically generating digital facial masks on no-masked faces according to the NIST protocols [1, 23]. Finally, a visual interpretation of the face areas contributing to the quality score of a selected set of quality assessment methods is provided to give a deeper insight into the difference of network decisions in masked and non-masked faces, among other variations.

[401]  arXiv:2110.11284 [pdf, other]
Title: Multi-Object Tracking and Segmentation with a Space-Time Memory Network
Comments: arXiv admin note: text overlap with arXiv:2107.07067
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a method for multi-object tracking and segmentation that does not require fine-tuning or per benchmark hyper-parameter selection. The proposed tracker, MeNToS, addresses particularly the data association problem. Indeed, the recently introduced HOTA metric, which has a better alignment with the human visual assessment by evenly balancing detections and associations quality, has shown that improvements are still needed for data association. After creating tracklets using instance segmentation and optical flow, the proposed method relies on a space-time memory network developed for one-shot video object segmentation to improve the association of tracklets with temporal gaps. We evaluated our tracker on KITTIMOTS and MOTSChallenge and show the benefit of our data association strategy with the HOTA metric. The project page is \url{www.mehdimiah.com/mentos+}.

[402]  arXiv:2110.11285 [pdf, ps, other]
Title: How to Fairly Allocate Easy and Difficult Chores
Comments: 27 pages
Subjects: Computer Science and Game Theory (cs.GT)

A major open question in fair allocation of indivisible items is whether there always exists an allocation of chores that is Pareto optimal (PO) and envy-free up to one item (EF1). We answer this question affirmatively for the natural class of bivalued utilities, where each agent partitions the chores into easy and difficult ones, and has cost $p > 1$ for chores that are difficult for her and cost $1$ for chores that are easy for her. Such an allocation can be found in polynomial time using an algorithm based on the Fisher market.
We also show that for a slightly broader class of utilities, where each agent $i$ can have a potentially different integer $p_i$, an allocation that is maximin share fair (MMS) always exists and one that is both PO and MMS can be computed in polynomial time, provided that each $p_i$ is an integer. Our MMS arguments also hold when allocating goods instead of chores, and extend to another natural class of utilities, namely weakly lexicographic utilities.

[403]  arXiv:2110.11286 [pdf, other]
Title: One-Shot Transfer Learning of Physics-Informed Neural Networks
Comments: [under review]
Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)

Solving differential equations efficiently and accurately sits at the heart of progress in many areas of scientific research, from classical dynamical systems to quantum mechanics. There is a surge of interest in using Physics-Informed Neural Networks (PINNs) to tackle such problems as they provide numerous benefits over traditional numerical approaches. Despite their potential benefits for solving differential equations, transfer learning has been under explored. In this study, we present a general framework for transfer learning PINNs that results in one-shot inference for linear systems of both ordinary and partial differential equations. This means that highly accurate solutions to many unknown differential equations can be obtained instantaneously without retraining an entire network. We demonstrate the efficacy of the proposed deep learning approach by solving several real-world problems, such as first- and second-order linear ordinary equations, the Poisson equation, and the time-dependent Schrodinger complex-value partial differential equation.

[404]  arXiv:2110.11290 [pdf, other]
Title: Physical Side-Channel Attacks on Embedded Neural Networks: A Survey
Comments: 25 pages, 7 figures
Journal-ref: M. M\'endez Real and R. Salvador, "Physical Side-Channel Attacks on Embedded Neural Networks: A Survey," Applied Sciences, vol. 11, no. 15, p. 6790, Jul. 2021
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

During the last decade, Deep Neural Networks (DNN) have progressively been integrated on all types of platforms, from data centers to embedded systems including low-power processors and, recently, FPGAs. Neural Networks (NN) are expected to become ubiquitous in IoT systems by transforming all sorts of real-world applications, including applications in the safety-critical and security-sensitive domains. However, the underlying hardware security vulnerabilities of embedded NN implementations remain unaddressed. In particular, embedded DNN implementations are vulnerable to Side-Channel Analysis (SCA) attacks, which are especially important in the IoT and edge computing contexts where an attacker can usually gain physical access to the targeted device. A research field has therefore emerged and is rapidly growing in terms of the use of SCA including timing, electromagnetic attacks and power attacks to target NN embedded implementations. Since 2018, research papers have shown that SCA enables an attacker to recover inference models architectures and parameters, to expose industrial IP and endangers data confidentiality and privacy. Without a complete review of this emerging field in the literature so far, this paper surveys state-of-the-art physical SCA attacks relative to the implementation of embedded DNNs on micro-controllers and FPGAs in order to provide a thorough analysis on the current landscape. It provides a taxonomy and a detailed classification of current attacks. It first discusses mitigation techniques and then provides insights for future research leads.

[405]  arXiv:2110.11292 [pdf, other]
Title: OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis
Comments: 18 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Systems and Control (eess.SY)

Logic synthesis is a challenging and widely-researched combinatorial optimization problem during integrated circuit (IC) design. It transforms a high-level description of hardware in a programming language like Verilog into an optimized digital circuit netlist, a network of interconnected Boolean logic gates, that implements the function. Spurred by the success of ML in solving combinatorial and graph problems in other domains, there is growing interest in the design of ML-guided logic synthesis tools. Yet, there are no standard datasets or prototypical learning tasks defined for this problem domain. Here, we describe OpenABC-D,a large-scale, labeled dataset produced by synthesizing open source designs with a leading open-source logic synthesis tool and illustrate its use in developing, evaluating and benchmarking ML-guided logic synthesis. OpenABC-D has intermediate and final outputs in the form of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs plus labels such as the optimized node counts, and de-lay. We define a generic learning problem on this dataset and benchmark existing solutions for it. The codes related to dataset creation and benchmark models are available athttps://github.com/NYU-MLDA/OpenABC.git. The dataset generated is available athttps://archive.nyu.edu/handle/2451/63311

[406]  arXiv:2110.11293 [pdf, other]
Title: An Empirical Study on GANs with Margin Cosine Loss and Relativistic Discriminator
Comments: 16 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generative Adversarial Networks (GANs) have emerged as useful generative models, which are capable of implicitly learning data distributions of arbitrarily complex dimensions. However, the training of GANs is empirically well-known for being highly unstable and sensitive. The loss functions of both the discriminator and generator concerning their parameters tend to oscillate wildly during training. Different loss functions have been proposed to stabilize the training and improve the quality of images generated. In this paper, we perform an empirical study on the impact of several loss functions on the performance of standard GAN models, Deep Convolutional Generative Adversarial Networks (DCGANs). We introduce a new improvement that employs a relativistic discriminator to replace the classical deterministic discriminator in DCGANs and implement a margin cosine loss function for both the generator and discriminator. This results in a novel loss function, namely \textit{Relativistic Margin Cosine Loss} (RMCosGAN). We carry out extensive experiments with four datasets: CIFAR-$10$, MNIST, STL-$10$, and CAT. We compare RMCosGAN performance with existing loss functions based on two metrics: Frechet inception distance and inception score. The experimental results show that RMCosGAN outperforms the existing ones and significantly improves the quality of images generated.

[407]  arXiv:2110.11298 [pdf, other]
Title: Video and Text Matching with Conditioned Embeddings
Journal-ref: WACV 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

We present a method for matching a text sentence from a given corpus to a given video clip and vice versa. Traditionally video and text matching is done by learning a shared embedding space and the encoding of one modality is independent of the other. In this work, we encode the dataset data in a way that takes into account the query's relevant information. The power of the method is demonstrated to arise from pooling the interaction data between words and frames. Since the encoding of the video clip depends on the sentence compared to it, the representation needs to be recomputed for each potential match. To this end, we propose an efficient shallow neural network. Its training employs a hierarchical triplet loss that is extendable to paragraph/video matching. The method is simple, provides explainability, and achieves state-of-the-art results for both sentence-clip and video-text by a sizable margin across five different datasets: ActivityNet, DiDeMo, YouCook2, MSR-VTT, and LSMDC. We also show that our conditioned representation can be transferred to video-guided machine translation, where we improved the current results on VATEX. Source code is available at https://github.com/AmeenAli/VideoMatch.

[408]  arXiv:2110.11299 [pdf, other]
Title: Transformer Acceleration with Dynamic Sparse Attention
Subjects: Machine Learning (cs.LG)

Transformers are the mainstream of NLP applications and are becoming increasingly popular in other domains such as Computer Vision. Despite the improvements in model quality, the enormous computation costs make Transformers difficult at deployment, especially when the sequence length is large in emerging applications. Processing attention mechanism as the essential component of Transformer is the bottleneck of execution due to the quadratic complexity. Prior art explores sparse patterns in attention to support long sequence modeling, but those pieces of work are on static or fixed patterns. We demonstrate that the sparse patterns are dynamic, depending on input sequences. Thus, we propose the Dynamic Sparse Attention (DSA) that can efficiently exploit the dynamic sparsity in the attention of Transformers. Compared with other methods, our approach can achieve better trade-offs between accuracy and model complexity. Moving forward, we identify challenges and provide solutions to implement DSA on existing hardware (GPUs) and specialized hardware in order to achieve practical speedup and efficiency improvements for Transformer execution.

[409]  arXiv:2110.11303 [pdf, other]
Title: Survival-oriented embeddings for improving accessibility to complex data structures
Comments: NeurIPS 2021 Workshop, Bridging the Gap: From Machine Learning Research to Clinical Practice
Subjects: Machine Learning (cs.LG)

Deep learning excels in the analysis of unstructured data and recent advancements allow to extend these techniques to survival analysis. In the context of clinical radiology, this enables, e.g., to relate unstructured volumetric images to a risk score or a prognosis of life expectancy and support clinical decision making. Medical applications are, however, associated with high criticality and consequently, neither medical personnel nor patients do usually accept black box models as reason or basis for decisions. Apart from averseness to new technologies, this is due to missing interpretability, transparency and accountability of many machine learning methods. We propose a hazard-regularized variational autoencoder that supports straightforward interpretation of deep neural architectures in the context of survival analysis, a field highly relevant in healthcare. We apply the proposed approach to abdominal CT scans of patients with liver tumors and their corresponding survival times.

[410]  arXiv:2110.11305 [pdf, other]
Title: On games and simulators as a platform for development of artificial intelligence for command and control
Comments: Preprint submitted to the Journal of Defense Modeling and Simulation (JDMS) for peer review
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Games and simulators can be a valuable platform to execute complex multi-agent, multiplayer, imperfect information scenarios with significant parallels to military applications: multiple participants manage resources and make decisions that command assets to secure specific areas of a map or neutralize opposing forces. These characteristics have attracted the artificial intelligence (AI) community by supporting development of algorithms with complex benchmarks and the capability to rapidly iterate over new ideas. The success of artificial intelligence algorithms in real-time strategy games such as StarCraft II have also attracted the attention of the military research community aiming to explore similar techniques in military counterpart scenarios. Aiming to bridge the connection between games and military applications, this work discusses past and current efforts on how games and simulators, together with the artificial intelligence algorithms, have been adapted to simulate certain aspects of military missions and how they might impact the future battlefield. This paper also investigates how advances in virtual reality and visual augmentation systems open new possibilities in human interfaces with gaming platforms and their military parallels.

[411]  arXiv:2110.11309 [pdf, other]
Title: Fast Model Editing at Scale
Comments: View implementation and additional project info at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

While large pre-trained models have enabled impressive results on a variety of downstream tasks, the largest existing models still make errors, and even accurate predictions may become outdated over time. Because detecting all such failures at training time is impossible, enabling both developers and end users of such models to correct inaccurate outputs while leaving the model otherwise intact is desirable. However, the distributed, black-box nature of the representations learned by large neural networks makes producing such targeted edits difficult. If presented with only a single problematic input and new desired output, fine-tuning approaches tend to overfit; other editing algorithms are either computationally infeasible or simply ineffective when applied to very large models. To enable easy post-hoc editing at scale, we propose Model Editor Networks with Gradient Decomposition (MEND), a collection of small auxiliary editing networks that use a single desired input-output pair to make fast, local edits to a pre-trained model. MEND learns to transform the gradient obtained by standard fine-tuning, using a low-rank decomposition of the gradient to make the parameterization of this transformation tractable. MEND can be trained on a single GPU in less than a day even for 10 billion+ parameter models; once trained MEND enables rapid application of new edits to the pre-trained model. Our experiments with T5, GPT, BERT, and BART models show that MEND is the only approach to model editing that produces effective edits for models with tens of millions to over 10 billion parameters. Implementation available at https://sites.google.com/view/mend-editing.

[412]  arXiv:2110.11312 [pdf, other]
Title: Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation
Comments: NeurIPS 2021 Workshop, Deep Generative Models and Downstream Applications
Subjects: Machine Learning (cs.LG)

The application of deep learning in survival analysis (SA) gives the opportunity to utilize unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches. We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases.

[413]  arXiv:2110.11314 [pdf, other]
Title: Center Loss Regularization for Continual Learning
Comments: 16 pages, 9 figures, Submitted to the ICLR 2022 conference
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The ability to learn different tasks sequentially is essential to the development of artificial intelligence. In general, neural networks lack this capability, the major obstacle being catastrophic forgetting. It occurs when the incrementally available information from non-stationary data distributions is continually acquired, disrupting what the model has already learned. Our approach remembers old tasks by projecting the representations of new tasks close to that of old tasks while keeping the decision boundaries unchanged. We employ the center loss as a regularization penalty that enforces new tasks' features to have the same class centers as old tasks and makes the features highly discriminative. This, in turn, leads to the least forgetting of already learned information. This method is easy to implement, requires minimal computational and memory overhead, and allows the neural network to maintain high performance across many sequentially encountered tasks. We also demonstrate that using the center loss in conjunction with the memory replay outperforms other replay-based strategies. Along with standard MNIST variants for continual learning, we apply our method to continual domain adaptation scenarios with the Digits and PACS datasets. We demonstrate that our approach is scalable, effective, and gives competitive performance compared to state-of-the-art continual learning methods.

[414]  arXiv:2110.11316 [pdf, other]
Title: CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
Comments: 14 pages (+ appendix); Blog: this https URL GitHub: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Contrastive learning with the InfoNCE objective is exceptionally successful in various self-supervised learning tasks. Recently, the CLIP model yielded impressive results on zero-shot transfer learning when using InfoNCE for learning visual representations from natural language supervision. However, InfoNCE as a lower bound on the mutual information has been shown to perform poorly for high mutual information. In contrast, the InfoLOOB upper bound (leave one out bound) works well for high mutual information but suffers from large variance and instabilities. We introduce "Contrastive Leave One Out Boost" (CLOOB), where modern Hopfield networks boost learning with the InfoLOOB objective. Modern Hopfield networks replace the original embeddings by retrieved embeddings in the InfoLOOB objective. The retrieved embeddings give InfoLOOB two assets. Firstly, the retrieved embeddings stabilize InfoLOOB, since they are less noisy and more similar to one another than the original embeddings. Secondly, they are enriched by correlations, since the covariance structure of embeddings is reinforced through retrievals. We compare CLOOB to CLIP after learning on the Conceptual Captions and the YFCC dataset with respect to their zero-shot transfer learning performance on other datasets. CLOOB consistently outperforms CLIP at zero-shot transfer learning across all considered architectures and datasets.

[415]  arXiv:2110.11320 [pdf, other]
Title: Deep Curriculum Learning in Task Space for Multi-Class Based Mammography Diagnosis
Comments: 4-page abstract. Full paper to appear at SPIE Medical Imaging 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Mammography is used as a standard screening procedure for the potential patients of breast cancer. Over the past decade, it has been shown that deep learning techniques have succeeded in reaching near-human performance in a number of tasks, and its application in mammography is one of the topics that medical researchers most concentrate on. In this work, we propose an end-to-end Curriculum Learning (CL) strategy in task space for classifying the three categories of Full-Field Digital Mammography (FFDM), namely Malignant, Negative, and False recall. Specifically, our method treats this three-class classification as a "harder" task in terms of CL, and create an "easier" sub-task of classifying False recall against the combined group of Negative and Malignant. We introduce a loss scheduler to dynamically weight the contribution of the losses from the two tasks throughout the entire training process. We conduct experiments on an FFDM datasets of 1,709 images using 5-fold cross validation. The results show that our curriculum learning strategy can boost the performance for classifying the three categories of FFDM compared to the baseline strategies for model training.

[416]  arXiv:2110.11323 [pdf, other]
Title: StyleAlign: Analysis and Applications of Aligned StyleGAN Models
Comments: 39 pages, 33 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

In this paper, we perform an in-depth study of the properties and applications of aligned generative models. We refer to two models as aligned if they share the same architecture, and one of them (the child) is obtained from the other (the parent) via fine-tuning to another domain, a common practice in transfer learning. Several works already utilize some basic properties of aligned StyleGAN models to perform image-to-image translation. Here, we perform the first detailed exploration of model alignment, also focusing on StyleGAN. First, we empirically analyze aligned models and provide answers to important questions regarding their nature. In particular, we find that the child model's latent spaces are semantically aligned with those of the parent, inheriting incredibly rich semantics, even for distant data domains such as human faces and churches. Second, equipped with this better understanding, we leverage aligned models to solve a diverse set of tasks. In addition to image translation, we demonstrate fully automatic cross-domain image morphing. We further show that zero-shot vision tasks may be performed in the child domain, while relying exclusively on supervision in the parent domain. We demonstrate qualitatively and quantitatively that our approach yields state-of-the-art results, while requiring only simple fine-tuning and inversion.

[417]  arXiv:2110.11325 [pdf, other]
Title: Learning 3D Semantic Segmentation with only 2D Image Supervision
Comments: Accepted to 3DV 2021 (Oral)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the recent growth of urban mapping and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras. However, due to high labeling costs, ground-truth 3D semantic segmentation annotations are limited in both quantity and geographic diversity, while also being difficult to transfer across sensors. In contrast, large image collections with ground-truth semantic segmentations are readily available for diverse sets of scenes. In this paper, we investigate how to use only those labeled 2D image collections to supervise training 3D semantic segmentation models. Our approach is to train a 3D model from pseudo-labels derived from 2D semantic image segmentations using multiview fusion. We address several novel issues with this approach, including how to select trusted pseudo-labels, how to sample 3D scenes with rare object categories, and how to decouple input features from 2D images from pseudo-labels during training. The proposed network architecture, 2D3DNet, achieves significantly better performance (+6.2-11.4 mIoU) than baselines during experiments on a new urban dataset with lidar and images captured in 20 cities across 5 continents.

[418]  arXiv:2110.11328 [pdf, other]
Title: A Fine-Grained Analysis on Distribution Shift
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Robustness to distribution shifts is critical for deploying machine learning models in the real world. Despite this necessity, there has been little work in defining the underlying mechanisms that cause these shifts and evaluating the robustness of algorithms across multiple, different distribution shifts. To this end, we introduce a framework that enables fine-grained analysis of various distribution shifts. We provide a holistic analysis of current state-of-the-art methods by evaluating 19 distinct methods grouped into five categories across both synthetic and real-world datasets. Overall, we train more than 85K models. Our experimental framework can be easily extended to include new methods, shifts, and datasets. We find, unlike previous work~\citep{Gulrajani20}, that progress has been made over a standard ERM baseline; in particular, pretraining and augmentations (learned or heuristic) offer large gains in many cases. However, the best methods are not consistent over different datasets and shifts.

[419]  arXiv:2110.11331 [pdf, other]
Title: RoQNN: Noise-Aware Training for Robust Quantum Neural Networks
Comments: 19 pages, 10 figures, open-source at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)

Quantum Neural Network (QNN) is a promising application towards quantum advantage on near-term quantum hardware. However, due to the large quantum noises (errors), the performance of QNN models has a severe degradation on real quantum devices. For example, the accuracy gap between noise-free simulation and noisy results on IBMQ-Yorktown for MNIST-4 classification is over 60%. Existing noise mitigation methods are general ones without leveraging unique characteristics of QNN and are only applicable to inference; on the other hand, existing QNN work does not consider noise effect. To this end, we present RoQNN, a QNN-specific framework to perform noise-aware optimizations in both training and inference stages to improve robustness. We analytically deduct and experimentally observe that the effect of quantum noise to QNN measurement outcome is a linear map from noise-free outcome with a scaling and a shift factor. Motivated by that, we propose post-measurement normalization to mitigate the feature distribution differences between noise-free and noisy scenarios. Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to QNN according to realistic noise models of quantum hardware. Finally, post-measurement quantization is introduced to quantize the measurement outcomes to discrete values, achieving the denoising effect. Extensive experiments on 8 classification tasks using 6 quantum devices demonstrate that RoQNN improves accuracy by up to 43%, and achieves over 94% 2-class, 80% 4-class, and 34% 10-class MNIST classification accuracy measured on real quantum computers. We also open-source our PyTorch library for construction and noise-aware training of QNN at https://github.com/mit-han-lab/pytorch-quantum .

[420]  arXiv:2110.11333 [pdf, other]
Title: A Python Package to Detect Anti-Vaccine Users on Twitter
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)

Vaccine hesitancy has a long history but has been recently driven by the anti-vaccine narratives shared online, which significantly degrades the efficacy of vaccination strategies, such as those for COVID-19. Despite broad agreement in the medical community about the safety and efficacy of available vaccines, a large number of social media users continue to be inundated with false information about vaccines and, partly because of this, became indecisive or unwilling to be vaccinated. The goal of this study is to better understand anti-vaccine sentiment, and work to reduce its impact, by developing a system capable of automatically identifying the users responsible for spreading anti-vaccine narratives. We introduce a publicly available Python package capable of analyzing Twitter profiles to assess how likely that profile is to spread anti-vaccine sentiment in the future. The software package is built using text embedding methods, neural networks, and automated dataset generation. It is trained on over one hundred thousand accounts and several million tweets. This model will help researchers and policy-makers understand anti-vaccine discussion and misinformation strategies, which can further help tailor targeted campaigns seeking to inform and debunk the harmful anti-vaccination myths currently being spread. Additionally, we leverage the data on such users to understand what are the moral and emotional characteristics of anti-vaccine spreaders.

[421]  arXiv:2110.11334 [pdf, other]
Title: Generalized Out-of-Distribution Detection: A Survey
Comments: Issues, comments, and questions are all welcomed in this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.

[422]  arXiv:2110.11335 [pdf, other]
Title: Convex Joint Graph Matching and Clustering via Semidefinite Relaxations
Comments: 12 pages, 8 figures; source code available; project webpage: this https URL
Journal-ref: International Conference on 3D Vision (3DV) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a new algorithm for simultaneous graph matching and clustering. For the first time in the literature, these two problems are solved jointly and synergetically without relying on any training data, which brings advantages for identifying similar arbitrary objects in compound 3D scenes and matching them. For joint reasoning, we first rephrase graph matching as a rigid point set registration problem operating on spectral graph embeddings. Consequently, we utilise efficient convex semidefinite program relaxations for aligning points in Hilbert spaces and add coupling constraints to model the mutual dependency and exploit synergies between both tasks. We outperform state of the art in challenging cases with non-perfectly matching and noisy graphs, and we show successful applications on real compound scenes with multiple 3D elements. Our source code and data are publicly available.

Cross-lists for Fri, 22 Oct 21

[423]  arXiv:1610.06631 (cross-list from eess.SY) [pdf, other]
Title: Inverse Power Flow Problem
Comments: working paper
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper formulates the inverse power flow problem which is to infer the nodal admittance matrix (hence the network structure of the power system) from voltage and current phasors measured at a number of buses. We show that the admittance matrix can be uniquely identified from a sequence of measurements corresponding to different steady states when every node in the system is equipped with a measurement device, and a Kron-reduced admittance matrix can be determined even if some nodes in the system are not monitored (hidden nodes). Furthermore, we propose effective algorithms based on graph theory to uncover the actual admittance matrix of radial systems with hidden nodes. We provide theoretical guarantees for the recovered admittance matrix and demonstrate that the actual admittance matrix can be fully recovered even from the Kron-reduced admittance matrix under some mild assumptions. Simulations on standard test systems confirm that these algorithms are capable of providing accurate estimates of the admittance matrix from noisy sensor data.

[424]  arXiv:2107.06013 (cross-list from cond-mat.dis-nn) [pdf, other]
Title: Barriers and Dynamical Paths in Alternating Gibbs Sampling of Restricted Boltzmann Machines
Journal-ref: Physical Review E : Statistical, Nonlinear, and Soft Matter Physics, American Physical Society, 2021
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)

Restricted Boltzmann Machines (RBM) are bi-layer neural networks used for the unsupervised learning of model distributions from data. The bipartite architecture of RBM naturally defines an elegant sampling procedure, called Alternating Gibbs Sampling (AGS), where the configurations of the latent-variable layer are sampled conditional to the data-variable layer, and vice versa. We study here the performance of AGS on several analytically tractable models borrowed from statistical mechanics. We show that standard AGS is not more efficient than classical Metropolis-Hastings (MH) sampling of the effective energy landscape defined on the data layer. However, RBM can identify meaningful representations of training data in their latent space. Furthermore, using these representations and combining Gibbs sampling with the MH algorithm in the latent space can enhance the sampling performance of the RBM when the hidden units encode weakly dependent features of the data. We illustrate our findings on three datasets: Bars and Stripes and MNIST, well known in machine learning, and the so-called Lattice Proteins, introduced in theoretical biology to study the sequence-to-structure mapping in proteins.

[425]  arXiv:2110.10156 (cross-list from eess.IV) [pdf, other]
Title: Cross-Sim-NGF: FFT-Based Global Rigid Multimodal Alignment of Image Volumes using Normalized Gradient Fields
Comments: 5 pages, 3 figures, 3 tables. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Multimodal image alignment involves finding spatial correspondences between volumes varying in appearance and structure. Automated alignment methods are often based on local optimization that can be highly sensitive to their initialization. We propose a global optimization method for rigid multimodal 3D image alignment, based on a novel efficient algorithm for computing similarity of normalized gradient fields (NGF) in the frequency domain. We validate the method experimentally on a dataset comprised of 20 brain volumes acquired in four modalities (T1w, Flair, CT, [18F] FDG PET), synthetically displaced with known transformations. The proposed method exhibits excellent performance on all six possible modality combinations, and outperforms all four reference methods by a large margin. The method is fast; a 3.4Mvoxel global rigid alignment requires approximately 40 seconds of computation, and the proposed algorithm outperforms a direct algorithm for the same task by more than three orders of magnitude. Open-source implementation is provided.

[426]  arXiv:2110.10199 (cross-list from math.OC) [pdf, other]
Title: Theoretical Advances in Current Estimation and Navigation from a Glider-Based Acoustic Doppler Current Profiler (ADCP)
Comments: Submitted to Journal of Atmospheric and Oceanic Technology. 15 pages main text. 10 pages figures, tables, bibliography, appendices
Subjects: Optimization and Control (math.OC); Robotics (cs.RO); Systems and Control (eess.SY)

We examine acoustic Doppler current profiler (ADCP) measurements from underwater gliders to determine glider position, glider velocity, and subsurface current. ADCPs, however, do not directly observe the quantities of interest; instead, they measure the relative motion of the vehicle and the water column. We examine the lineage of mathematical innovations that have previously been applied to this problem, discovering an unstated but incorrect assumption of independence. We reframe a recent method to form a joint probability model of current and vehicle navigation, which allows us to correct this assumption and extend the classic Kalman smoothing method. Detailed simulations affirm the efficacy of our approach for computing estimates and their uncertainty. The joint model developed here sets the stage for future work to incorporate constraints, range measurements, and robust statistical modeling.

[427]  arXiv:2110.10210 (cross-list from math.PR) [pdf, other]
Title: Long Random Matrices and Tensor Unfolding
Comments: 29 pages, 4 figures
Subjects: Probability (math.PR); Machine Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we consider the singular values and singular vectors of low rank perturbations of large rectangular random matrices, in the regime the matrix is "long": we allow the number of rows (columns) to grow polynomially in the number of columns (rows). We prove there exists a critical signal-to-noise ratio (depending on the dimensions of the matrix), and the extreme singular values and singular vectors exhibit a BBP type phase transition. As a main application, we investigate the tensor unfolding algorithm for the asymmetric rank-one spiked tensor model, and obtain an exact threshold, which is independent of the procedure of tensor unfolding. If the signal-to-noise ratio is above the threshold, tensor unfolding detects the signals; otherwise, it fails to capture the signals.

[428]  arXiv:2110.10219 (cross-list from eess.SP) [pdf, other]
Title: Power Line Communication Based Smart Grid Asset Monitoring Using Time Series Forecasting
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

Monitoring grid assets continuously is critical in ensuring the reliable operation of the electricity grid system and improving its resilience in case of a defect. In light of several asset monitoring techniques in use, power line communication (PLC) enables a low-cost cable diagnostics solution by re-using smart grid data communication modems to also infer the cable health using the inherently estimated communication channel state information. Traditional PLC-based cable diagnostics solutions are dependent on prior knowledge of the cable type, network topology, and/or characteristics of the anomalies. In contrast, we develop an asset monitoring technique in this paper that can detect various types of anomalies in the grid without any prior domain knowledge. To this end, we design a solution that first uses time-series forecasting to predict the PLC channel state information at any given point in time based on its historical data. Under the assumption that the prediction error follows a Gaussian distribution, we then perform chi-squared statistical test to determine the significance level of the resultant Mahalanobis distance to build our anomaly detector. We demonstrate the effectiveness and universality of our solution via evaluations conducted using both synthetic and real-world data extracted from low- and medium-voltage distribution networks.

[429]  arXiv:2110.10220 (cross-list from eess.SP) [pdf, other]
Title: Patch Based Transformation for Minimum Variance Beamformer Image Approximation Using Delay and Sum Pipeline
Comments: 6 pages, 3 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

In the recent past, there have been several efforts in accelerating computationally heavy beamforming algorithms such as minimum variance distortionless response (MVDR) beamforming to achieve real-time performance comparable to the popular delay and sum (DAS) beamforming. This has been achieved using a variety of neural network architectures ranging from fully connected neural networks (FCNNs), convolutional neural networks (CNNs) and general adversarial networks (GANs). However most of these approaches are working with optimizations considering image level losses and hence require a significant amount of dataset to ensure that the process of beamforming is learned. In this work, a patch level U-Net based neural network is proposed, where the delay compensated radio frequency (RF) patch for a fixed region in space (e.g. 32x32) is transformed through a U-Net architecture and multiplied with DAS apodization weights and optimized for similarity with MVDR image of the patch. Instead of framing the beamforming problem as a regression problem to estimate the apodization weights, the proposed approach treats the non-linear transformation of the RF data space that can account for the data driven weight adaptation done by the MVDR approach in the parameters of the network. In this way, it is also observed that by restricting the input to a patch the model will learn the beamforming pipeline as an image non-linear transformation problem.

[430]  arXiv:2110.10242 (cross-list from eess.IV) [pdf]
Title: A New Automatic Change Detection Frame-work Based on Region Growing and Weighted Local Mutual Information: Analysis of Breast Tumor Response to Chemotherapy in Serial MR Images
Comments: 18 pages, 16 figures, 14 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The automatic analysis of subtle changes between longitudinal MR images is an important task as it is still a challenging issue in scope of the breast medical image processing. In this paper we propose an effective automatic change detection framework composed of two phases since previously used methods have features with low distinctive power. First, in the preprocessing phase an intensity normalization method is suggested based on Hierarchical Histogram Matching (HHM) that is more robust to noise than previous methods. To eliminate undesirable changes and extract the regions containing significant changes the proposed Extraction Region of Changes (EROC) method is applied based on intensity distribution and Hill-Climbing algorithm. Second, in the detection phase a region growing-based approach is suggested to differentiate significant changes from unreal ones. Due to using proposed Weighted Local Mutual Information (WLMI) method to extract high level features and also utilizing the principle of the local consistency of changes, the proposed approach enjoys reasonable performance. The experimental results on both simulated and real longitudinal Breast MR Images confirm the effectiveness of the proposed framework. Also, this framework outperforms the human expert in some cases which can detect many lesion evolutions that are missed by expert.

[431]  arXiv:2110.10267 (cross-list from math.DS) [pdf, ps, other]
Title: Recognizability of morphisms
Subjects: Dynamical Systems (math.DS); Formal Languages and Automata Theory (cs.FL)

We investigate several questions related to the notion of recognizable morphism. The main result is a new proof of Mosse's theorem and actually of a generalization to non primitive morphisms due to Berth\'e et al. We actually prove the result of Berth\'e et al. for the most general class of morphisms, including ones with erasable letters. It is derived from a result concerning elementary morphisms for which we also provide a new proof. We also show how to decide whether an injective morphism is recognizable on the full shift for aperiodic points.

[432]  arXiv:2110.10279 (cross-list from math.OC) [pdf, other]
Title: Factorization Approach for Low-complexity Matrix Completion Problems: Exponential Number of Spurious Solutions and Failure of Gradient Methods
Comments: 21 pages, 1 figure
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

It is well-known that the Burer-Monteiro (B-M) factorization approach can efficiently solve low-rank matrix optimization problems under the RIP condition. It is natural to ask whether B-M factorization-based methods can succeed on any low-rank matrix optimization problems with a low information-theoretic complexity, i.e., polynomial-time solvable problems that have a unique solution. In this work, we provide a negative answer to the above question. We investigate the landscape of B-M factorized polynomial-time solvable matrix completion (MC) problems, which are the most popular subclass of low-rank matrix optimization problems without the RIP condition. We construct an instance of polynomial-time solvable MC problems with exponentially many spurious local minima, which leads to the failure of most gradient-based methods. Based on those results, we define a new complexity metric that potentially measures the solvability of low-rank matrix optimization problems based on the B-M factorization approach. In addition, we show that more measurements of the ground truth matrix can deteriorate the landscape, which further reveals the unfavorable behavior of the B-M factorization on general low-rank matrix optimization problems.

[433]  arXiv:2110.10281 (cross-list from stat.ME) [pdf, other]
Title: Joint Gaussian Graphical Model Estimation: A Survey
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

Graphs from complex systems often share a partial underlying structure across domains while retaining individual features. Thus, identifying common structures can shed light on the underlying signal, for instance, when applied to scientific discoveries or clinical diagnoses. Furthermore, growing evidence shows that the shared structure across domains boosts the estimation power of graphs, particularly for high-dimensional data. However, building a joint estimator to extract the common structure may be more complicated than it seems, most often due to data heterogeneity across sources. This manuscript surveys recent work on statistical inference of joint Gaussian graphical models, identifying model structures that fit various data generation processes. Simulations under different data generation processes are implemented with detailed discussions on the choice of models.

[434]  arXiv:2110.10323 (cross-list from stat.ML) [pdf, other]
Title: Computational Graph Completion
Authors: Houman Owhadi
Comments: 31 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)

We introduce a framework for generating, organizing, and reasoning with computational knowledge. It is motivated by the observation that most problems in Computational Sciences and Engineering (CSE) can be described as that of completing (from data) a computational graph representing dependencies between functions and variables. Functions and variables may be known, unknown, or random. Data comes in the form of observations of distinct values of a finite number of subsets of the variables of the graph. The underlying problem combines a regression problem (approximating unknown functions) with a matrix completion problem (recovering unobserved variables in the data). Replacing unknown functions by Gaussian Processes (GPs) and conditioning on observed data provides a simple but efficient approach to completing such graphs. Since the proposed framework is highly expressive, it has a vast potential application scope. Since the completion process can be automatized, as one solves $\sqrt{\sqrt{2}+\sqrt{3}}$ on a pocket calculator without thinking about it, one could, with the proposed framework, solve a complex CSE problem by drawing a diagram. Compared to traditional kriging, the proposed framework can be used to recover unknown functions with much scarcer data by exploiting interdependencies between multiple functions and variables. The Computational Graph Completion (CGC) problem addressed by the proposed framework could therefore also be interpreted as a generalization of that of solving linear systems of equations to that of approximating unknown variables and functions with noisy, incomplete, and nonlinear dependencies. Numerous examples illustrate the flexibility, scope, efficacy, and robustness of the CGC framework and show how it can be used as a pathway to identifying simple solutions to classical CSE problems (digital twin modeling, dimension reduction, mode decomposition, etc.).

[435]  arXiv:2110.10326 (cross-list from eess.AS) [pdf, other]
Title: Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity
Comments: Submitted to ICASSP2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Expressive voice conversion performs identity conversion for emotional speakers by jointly converting speaker identity and speaker-dependent emotion style. Due to the hierarchical structure of speech emotion, it is challenging to disentangle the speaker-dependent emotional style for expressive voice conversion. Motivated by the recent success on speaker disentanglement with variational autoencoder (VAE), we propose an expressive voice conversion framework which can effectively disentangle linguistic content, speaker identity, pitch, and emotional style information. We study the use of emotion encoder to model emotional style explicitly, and introduce mutual information (MI) losses to reduce the irrelevant information from the disentangled emotion representations. At run-time, our proposed framework can convert both speaker identity and speaker-dependent emotional style without the need for parallel data. Experimental results validate the effectiveness of our proposed framework in both objective and subjective evaluations.

[436]  arXiv:2110.10330 (cross-list from eess.AS) [pdf, other]
Title: One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

With the recent surge of video conferencing tools usage, providing high-quality speech signals and accurate captions have become essential to conduct day-to-day business or connect with friends and families. Single-channel personalized speech enhancement (PSE) methods show promising results compared with the unconditional speech enhancement (SE) methods in these scenarios due to their ability to remove interfering speech in addition to the environmental noise. In this work, we leverage spatial information afforded by microphone arrays to improve such systems' performance further. We investigate the relative importance of speaker embeddings and spatial features. Moreover, we propose a new causal array-geometry-agnostic multi-channel PSE model, which can generate a high-quality enhanced signal from arbitrary microphone geometry. Experimental results show that the proposed geometry agnostic model outperforms the model trained on a specific microphone array geometry in both speech quality and automatic speech recognition accuracy. We also demonstrate the effectiveness of the proposed approach for unseen array geometries.

[437]  arXiv:2110.10332 (cross-list from physics.med-ph) [pdf]
Title: Artificial Intelligence-Based Detection, Classification and Prediction/Prognosis in PET Imaging: Towards Radiophenomics
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Artificial intelligence (AI) techniques have significant potential to enable effective, robust, and automated image phenotyping including identification of subtle patterns. AI-based detection searches the image space to find the regions of interest based on patterns and features. There is a spectrum of tumor histologies from benign to malignant that can be identified by AI-based classification approaches using image features. The extraction of minable information from images gives way to the field of radiomics and can be explored via explicit (handcrafted/engineered) and deep radiomics frameworks. Radiomics analysis has the potential to be utilized as a noninvasive technique for the accurate characterization of tumors to improve diagnosis and treatment monitoring. This work reviews AI-based techniques, with a special focus on oncological PET and PET/CT imaging, for different detection, classification, and prediction/prognosis tasks. We also discuss needed efforts to enable the translation of AI techniques to routine clinical workflows, and potential improvements and complementary techniques such as the use of natural language processing on electronic health records and neuro-symbolic AI techniques.

[438]  arXiv:2110.10351 (cross-list from math.OC) [pdf, other]
Title: Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process
Comments: The paper was initially submitted for publication in January 2021
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new primal-dual approach is proposed with a novel integration of three ingredients: entropy regularized policy optimizer, dual variable regularizer, and Nesterov's accelerated gradient descent dual optimizer, all of which are critical to achieve a faster convergence. The finite-time error bound of the proposed approach is characterized. Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of $\tilde{\mathcal O}(1/\epsilon)$ in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of $\mathcal O(1/\epsilon)$ \citep{ding2020natural,paternain2019constrained}. This is the first demonstration that nonconcave CMDP problems can attain the complexity lower bound of $\mathcal O(1/\epsilon)$ for convex optimization subject to convex constraints. Our primal-dual approach and non-asymptotic analysis are agnostic to the RL optimizer used, and thus are more flexible for practical applications. More generally, our approach also serves as the first algorithm that provably accelerates constrained nonconvex optimization with zero duality gap by exploiting the geometries such as the gradient dominance condition, for which the existing acceleration methods for constrained convex optimization are not applicable.

[439]  arXiv:2110.10362 (cross-list from math.AP) [pdf, other]
Title: The Bleeps, the Sweeps, and the Creeps: Convergence Rates for Dynamic Observer Patterns via Data Assimilation for the 2D Navier-Stokes Equations
Subjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA); Optimization and Control (math.OC)

We adapt a continuous data assimilation scheme, known as the Azouani-Olson-Titi (AOT) algorithm, to the case of moving observers for the 2D incompressible Navier-Stokes equations. We propose and test computationally several movement patterns (which we refer to as "the bleeps, the sweeps and the creeps"), as well as Lagrangian motion and combinations of these patterns, in comparison with static (i.e. non-moving) observers. In several cases, order-of-magnitude improvements in terms of the time-to-convergence are observed. We end with a discussion of possible applications to real-world data collection strategies that may lead to substantial improvements in predictive capabilities.

[440]  arXiv:2110.10381 (cross-list from eess.IV) [pdf, other]
Title: Medical Knowledge-Guided Deep Curriculum Learning for Elbow Fracture Diagnosis from X-Ray Images
Comments: SPIE Medical Imaging 2021. DOI: this https URL URL: this https URL
Journal-ref: SPIE Medical Imaging 2021: Computer-Aided Diagnosis
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Elbow fractures are one of the most common fracture types. Diagnoses on elbow fractures often need the help of radiographic imaging to be read and analyzed by a specialized radiologist with years of training. Thanks to the recent advances of deep learning, a model that can classify and detect different types of bone fractures needs only hours of training and has shown promising results. However, most existing deep learning models are purely data-driven, lacking incorporation of known domain knowledge from human experts. In this work, we propose a novel deep learning method to diagnose elbow fracture from elbow X-ray images by integrating domain-specific medical knowledge into a curriculum learning framework. In our method, the training data are permutated by sampling without replacement at the beginning of each training epoch. The sampling probability of each training sample is guided by a scoring criterion constructed based on clinically known knowledge from human experts, where the scoring indicates the diagnosis difficultness of different elbow fracture subtypes. We also propose an algorithm that updates the sampling probabilities at each epoch, which is applicable to other sampling-based curriculum learning frameworks. We design an experiment with 1865 elbow X-ray images for a fracture/normal binary classification task and compare our proposed method to a baseline method and a previous method using multiple metrics. Our results show that the proposed method achieves the highest classification performance. Also, our proposed probability update algorithm boosts the performance of the previous method.

[441]  arXiv:2110.10383 (cross-list from eess.IV) [pdf, other]
Title: Knowledge-Guided Multiview Deep Curriculum Learning for Elbow Fracture Classification
Comments: MICCAI 2021 workshop. DOI: this https URL URL: this https URL
Journal-ref: In International Workshop on Machine Learning in Medical Imaging (pp. 555-564) at MICCAI 2021. Springer, Cham
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Elbow fracture diagnosis often requires patients to take both frontal and lateral views of elbow X-ray radiographs. In this paper, we propose a multiview deep learning method for an elbow fracture subtype classification task. Our strategy leverages transfer learning by first training two single-view models, one for frontal view and the other for lateral view, and then transferring the weights to the corresponding layers in the proposed multiview network architecture. Meanwhile, quantitative medical knowledge was integrated into the training process through a curriculum learning framework, which enables the model to first learn from "easier" samples and then transition to "harder" samples to reach better performance. In addition, our multiview network can work both in a dual-view setting and with a single view as input. We evaluate our method through extensive experiments on a classification task of elbow fracture with a dataset of 1,964 images. Results show that our method outperforms two related methods on bone fracture study in multiple settings, and our technique is able to boost the performance of the compared methods. The code is available at https://github.com/ljaiverson/multiview-curriculum.

[442]  arXiv:2110.10394 (cross-list from eess.IV) [pdf, other]
Title: Deep Learning for HDR Imaging: State-of-the-Art and Future Trends
Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

High dynamic range (HDR) imaging is a technique that allows an extensive dynamic range of exposures, which is important in image processing, computer graphics, and computer vision. In recent years, there has been a significant advancement in HDR imaging using deep learning (DL). This study conducts a comprehensive and insightful survey and analysis of recent developments in deep HDR imaging methodologies. We hierarchically and structurally group existing deep HDR imaging methods into five categories based on (1) number/domain of input exposures, (2) number of learning tasks, (3) novel sensor data, (4) novel learning strategies, and (5) applications. Importantly, we provide a constructive discussion on each category regarding its potential and challenges. Moreover, we review some crucial aspects of deep HDR imaging, such as datasets and evaluation metrics. Finally, we highlight some open problems and point out future research directions.

[443]  arXiv:2110.10398 (cross-list from cond-mat.mtrl-sci) [pdf]
Title: Modelling of microstructures during in-situ alloying in additive manufacturing for efficient material qualification processes
Comments: 12 pages, 4 figures, submitted to ASIM Simulation in Produktion und Logistik 2021, Erlangen
Subjects: Materials Science (cond-mat.mtrl-sci); Numerical Analysis (math.NA)

In this work, a numerical simulation framework is presented based on the Phase Field Method that is able to capture the evolution of heterogeneous metallic microstructures during solidification. The involved physics can prove especially useful when studying not only systems undergoing thermal gradients, such as in homogeneous systems, but also in conditions that exhibit stark spatial gradients, i.e. when these inhomogeneities are present even on a mesoscopic scale. To illustrate the capabilities of the model, in-situ alloying of a High Entropy Alloy during Laser Powder Bed Fusion is investigated as an exemplary use case. The resulting digital twin is expected to shorten development times of new materials as well as cut down on experimental resource needs considerably, therefore contributing to efficient material qualification processes.

[444]  arXiv:2110.10403 (cross-list from eess.IV) [pdf, other]
Title: AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the U-Net model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these approaches only consider the attention encoding within one single slice and do not utilize the axial-axis information naturally provided by a 3D volume. In the 3D setting, convolution on volumetric data and transformers both consume large GPU memory. One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance. In this paper, we propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling. It considers both intra-slice and inter-slice long-range cues to guide the segmentation. Meanwhile, it has fewer parameters and takes less GPU memory to train than the previous transformer-based models. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.

[445]  arXiv:2110.10411 (cross-list from stat.ME) [pdf, other]
Title: Hyperspherical Dirac Mixture Reapproximation
Comments: 21 pages
Subjects: Methodology (stat.ME); Systems and Control (eess.SY)

We propose a novel scheme for efficient Dirac mixture modeling of distributions on unit hyperspheres. A so-called hyperspherical localized cumulative distribution (HLCD) is introduced as a local and smooth characterization of the underlying continuous density in hyperspherical domains. Based on HLCD, a manifold-adapted modification of the Cram\'er-von Mises distance (HCvMD) is established to measure the statistical divergence between two Dirac mixtures of arbitrary dimensions. Given a (source) Dirac mixture with many components representing an unknown hyperspherical distribution, a (target) Dirac mixture with fewer components is obtained via matching the source in the sense of least HCvMD. As the number of target Dirac components is configurable, the underlying distributions is represented in a more efficient and informative way. Based upon this hyperspherical Dirac mixture reapproximation (HDMR), we derive a density estimation method and a recursive filter. For density estimation, a maximum likelihood method is provided to reconstruct the underlying continuous distribution in the form of a von Mises-Fisher mixture. For recursive filtering, we introduce the hyperspherical reapproximation discrete filter (HRDF) for nonlinear hyperspherical estimation of dynamic systems under unknown system noise of arbitrary form. Simulations show that the HRDF delivers superior tracking performance over filters using sequential Monte Carlo and parametric modeling.

[446]  arXiv:2110.10441 (cross-list from math.OC) [pdf, other]
Title: Feedback Linearization of Car Dynamics for Racing via Reinforcement Learning
Comments: Final research paper for Berkeley's CS 285 (Deep Reinforcement Learning) in Fall 2020
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Robotics (cs.RO)

Through the method of Learning Feedback Linearization, we seek to learn a linearizing controller to simplify the process of controlling a car to race autonomously. A soft actor-critic approach is used to learn a decoupling matrix and drift vector that effectively correct for errors in a hand-designed linearizing controller. The result is an exactly linearizing controller that can be used to enable the well-developed theory of linear systems to design path planning and tracking schemes that are easy to implement and significantly less computationally demanding. To demonstrate the method of feedback linearization, it is first used to learn a simulated model whose exact structure is known, but varied from the initial controller, so as to introduce error. We further seek to apply this method to a system that introduces even more error in the form of a gym environment specifically designed for modeling the dynamics of car racing. To do so, we posit an extension to the method of learning feedback linearization; a neural network that is trained using supervised learning to convert the output of our linearizing controller to the required input for the racing environment. Our progress towards these goals is reported and the next steps in their accomplishment are discussed.

[447]  arXiv:2110.10489 (cross-list from eess.IV) [pdf, other]
Title: Evaluation of augmentation methods in classifying autism spectrum disorders from fMRI data with 3D convolutional neural networks
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

Classifying subjects as healthy or diseased using neuroimaging data has gained a lot of attention during the last 10 years. Here we apply deep learning to derivatives from resting state fMRI data, and investigate how different 3D augmentation techniques affect the test accuracy. Specifically, we use resting state derivatives from 1,112 subjects in ABIDE preprocessed to train a 3D convolutional neural network (CNN) to perform the classification. Our results show that augmentation only provide minor improvements to the test accuracy.

[448]  arXiv:2110.10498 (cross-list from math.OC) [pdf, other]
Title: Differential Privacy in Multi-Party Resource Sharing
Subjects: Optimization and Control (math.OC); Cryptography and Security (cs.CR)

This study examines a resource-sharing problem involving multiple parties that agree to use a set of capacities together. We start with modeling the whole problem as a mathematical program, where all parties are required to exchange information to obtain the optimal objective function value. This information bears private data from each party in terms of coefficients used in the mathematical program. Moreover, the parties also consider the individual optimal solutions as private. In this setting, the concern for the parties is the privacy of their data and their optimal allocations. We propose a two-step approach to meet the privacy requirements of the parties. In the first step, we obtain a reformulated model that is amenable to a decomposition scheme. Although this scheme eliminates almost all data exchange, it does not provide a formal privacy guarantee. In the second step, we provide this guarantee with a differentially private algorithm at the expense of deviating slightly from the optimality. We provide bounds on this deviation and discuss the consequences of these theoretical results. The study ends with a simulation study on a planning problem that demonstrates an application of the proposed approach. Our work provides a new optimization model and a solution approach for optimal allocation of a set of shared resources among multiple parties who expect privacy of their data. The proposed approach is based on the decomposition of the shared resources and the randomization of the optimization iterations. With our analysis, we show that the resulting randomized algorithm does give a guarantee for the privacy of each party's data. As we work with a general optimization model, our analysis and discussion can be used in different application areas including production planning, logistics, and network revenue management.

[449]  arXiv:2110.10518 (cross-list from stat.ML) [pdf, other]
Title: Online non-parametric change-point detection for heterogeneous data streams observed over graph nodes
Comments: 11 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Consider a heterogeneous data stream being generated by the nodes of a graph. The data stream is in essence composed by multiple streams, possibly of different nature that depends on each node. At a given moment $\tau$, a change-point occurs for a subset of nodes $C$, signifying the change in the probability distribution of their associated streams. In this paper we propose an online non-parametric method to infer $\tau$ based on the direct estimation of the likelihood-ratio between the post-change and the pre-change distribution associated with the data stream of each node. We propose a kernel-based method, under the hypothesis that connected nodes of the graph are expected to have similar likelihood-ratio estimates when there is no change-point. We demonstrate the quality of our method on synthetic experiments and real-world applications.

[450]  arXiv:2110.10520 (cross-list from eess.IV) [pdf, other]
Title: Development and accuracy evaluation of Coded Phase-shift 3D scanner
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In this paper, we provide an overview of development of a structured light 3D-scanner based on combination of binary-coded patterns and sinusoidal phase-shifted fringe patterns called Coded Phase-shift technique. Further, we describe the experiments performed to evaluate measurement accuracy and precision of the developed system. A study of this kind is expected to be helpful in understanding the basic working of current structured-light 3D scanners and the approaches followed for their performance assessment.

[451]  arXiv:2110.10530 (cross-list from math.CO) [pdf, ps, other]
Title: Improved pyrotechnics : Closer to the burning graph conjecture
Comments: 8 pages
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

Can every connected graph burn in $\lceil \sqrt{n} \rceil $ steps? While this conjecture remains open, we prove that it is asymptotically true when the graph is much larger than its \emph{growth}, which is the maximal distance of a vertex to a well-chosen path in the graph. In fact, we prove that the conjecture for graphs of bounded growth boils down to a finite number of cases. Through an improved (but still weaker) bound for all trees, we argue that the conjecture almost holds for all graphs with minimum degree at least $3$ and holds for all large enough graphs with minimum degree at least $4$. The previous best lower bound was $23$.

[452]  arXiv:2110.10557 (cross-list from astro-ph.CO) [pdf, other]
Title: Analytic Correlation of Inflationary Potential to Power Spectrum Shape: Limits of Validity, and `No-Go' for Small Field Model Analytics
Authors: Ira Wolfson
Comments: 20 pages, 13 figures, 2 tables. COde package INSANE will be made public shortly
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Numerical Analysis (math.NA)

The primordial power spectrum informs the possible inflationary histories of our universe. Given a power spectrum, the ensuing cosmic microwave background is calculated and compared to the observed one. Thus, one focus of modern cosmology is building well-motivated inflationary models that predict the primordial power spectrum observables. The common practice uses analytic terms for the scalar spectral index $n_s$ and the index running $\alpha$, forgoing the effort required to evaluate the model numerically. However, the validity of these terms has never been rigorously probed and relies on perturbative methods, which may lose their efficacy for large perturbations. The requirement for more accurate theoretical predictions becomes crucial with the advent of highly sensitive measuring instruments. This paper probes the limits of the perturbative treatment that connects inflationary potential parameters to primordial power spectrum observables. We show that the validity of analytic approximations of the scalar index roughly respects the large-field/small-field dichotomy. We supply an easily calculated measure for relative perturbation amplitude and show that, for large field models, the validity of analytical terms extends to $\sim 3\%$ perturbation relative to a power-law inflation model. Conversely, the analytical treatment loses its validity for small-field models with as little as $0.1\%$ perturbation relative to the small-field test-case. By employing the most general artificial neural networks and multinomial functions up to the twentieth degree and demonstrating their shortcomings, we show that no reasonable analytic expressions correlating small field models to the observables the yield exists. Finally, we discuss the possible implications of this work and supply the validity heuristic for large and small field models.

[453]  arXiv:2110.10587 (cross-list from quant-ph) [pdf, other]
Title: Quantum networks theory
Comments: 41 pages, 10 figures
Journal-ref: 2021 https://youtu.be/wqwLE8aDRTU
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET); General Relativity and Quantum Cosmology (gr-qc); Mathematical Physics (math-ph); Dynamical Systems (math.DS)

The formalism of quantum theory over discrete systems is extended in two significant ways. First, tensors and traceouts are generalized, so that systems can be partitioned according to almost arbitrary logical predicates. Second, quantum evolutions are generalized to act over network configurations, in such a way that nodes be allowed to merge, split and reconnect coherently in a superposition. The hereby presented mathematical framework is anchored on solid grounds through numerous lemmas. Indeed, one might have feared that the familiar interrelations between the notions of unitarity, complete positivity, trace-preservation, non-signalling causality, locality and localizability that are standard in quantum theory be jeopardized as the partitioning of systems becomes both logical and dynamical. Such interrelations in fact carry through, albeit two new notions become instrumental: consistency and comprehension.

[454]  arXiv:2110.10640 (cross-list from eess.IV) [pdf, other]
Title: OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data
Comments: BMVC 2021 (accepted), this https URL (code)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Convolutional neural networks (CNNs) are the current state-of-the-art meta-algorithm for volumetric segmentation of medical data, for example, to localize COVID-19 infected tissue on computer tomography scans or the detection of tumour volumes in magnetic resonance imaging. A key limitation of 3D CNNs on voxelised data is that the memory consumption grows cubically with the training data resolution. Occupancy networks (O-Nets) are an alternative for which the data is represented continuously in a function space and 3D shapes are learned as a continuous decision boundary. While O-Nets are significantly more memory efficient than 3D CNNs, they are limited to simple shapes, are relatively slow at inference, and have not yet been adapted for 3D semantic segmentation of medical data. Here, we propose Occupancy Networks for Semantic Segmentation (OSS-Nets) to accurately and memory-efficiently segment 3D medical data. We build upon the original O-Net with modifications for increased expressiveness leading to improved segmentation performance comparable to 3D CNNs, as well as modifications for faster inference. We leverage local observations to represent complex shapes and prior encoder predictions to expedite inference. We showcase OSS-Net's performance on 3D brain tumour and liver segmentation against a function space baseline (O-Net), a performance baseline (3D residual U-Net), and an efficiency baseline (2D residual U-Net). OSS-Net yields segmentation results similar to the performance baseline and superior to the function space and efficiency baselines. In terms of memory efficiency, OSS-Net consumes comparable amounts of memory as the function space baseline, somewhat more memory than the efficiency baseline and significantly less than the performance baseline. As such, OSS-Net enables memory-efficient and accurate 3D semantic segmentation that can scale to high resolutions.

[455]  arXiv:2110.10645 (cross-list from eess.IV) [pdf, other]
Title: Combining Different V1 Brain Model Variants to Improve Robustness to Image Corruptions in CNNs
Comments: 15 pages with supplementary material, 3 main figures, 2 supplementary figures, 4 supplementary tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

While some convolutional neural networks (CNNs) have surpassed human visual abilities in object classification, they often struggle to recognize objects in images corrupted with different types of common noise patterns, highlighting a major limitation of this family of models. Recently, it has been shown that simulating a primary visual cortex (V1) at the front of CNNs leads to small improvements in robustness to these image perturbations. In this study, we start with the observation that different variants of the V1 model show gains for specific corruption types. We then build a new model using an ensembling technique, which combines multiple individual models with different V1 front-end variants. The model ensemble leverages the strengths of each individual model, leading to significant improvements in robustness across all corruption categories and outperforming the base model by 38% on average. Finally, we show that using distillation, it is possible to partially compress the knowledge in the ensemble model into a single model with a V1 front-end. While the ensembling and distillation techniques used here are hardly biologically-plausible, the results presented here demonstrate that by combining the specific strengths of different neuronal circuits in V1 it is possible to improve the robustness of CNNs for a wide range of perturbations.

[456]  arXiv:2110.10671 (cross-list from math.OC) [pdf, ps, other]
Title: Adaptive Gradient Descent for Optimal Control of Parabolic Equations with Random Parameters
Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)

In this paper we extend the adaptive gradient descent (AdaGrad) algorithm to the optimal distributed control of parabolic partial differential equations with uncertain parameters. This stochastic optimization method achieves an improved convergence rate through adaptive scaling of the gradient step size. We prove the convergence of the algorithm for this infinite dimensional problem under suitable regularity, convexity, and finite variance conditions, and relate these to verifiable properties of the underlying system parameters. Finally, we apply our algorithm to the optimal thermal regulation of lithium battery systems under uncertain loads.

[457]  arXiv:2110.10685 (cross-list from quant-ph) [pdf, other]
Title: Predicting parameters for the Quantum Approximate Optimization Algorithm for MAX-CUT from the infinite-size limit
Comments: 59 pages, 8 figures
Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)

Combinatorial optimization is regarded as a potentially promising application of near and long-term quantum computers. The best-known heurist