We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 468 entries: 1-468 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 2 Dec 21

[1]  arXiv:2112.00004 [pdf, other]
Title: cliquematch: Finding correspondence via cliques in large graphs
Comments: 11 pages, 3 figures, 1 table; Code available at this https URL
Subjects: Data Structures and Algorithms (cs.DS); Mathematical Software (cs.MS)

The maximum clique problem finds applications in computer vision, bioinformatics, and network analysis, many of which involve the construction of correspondence graphs to find similarities between two given objects. cliquematch is a Python package designed for this purpose: it provides a simple framework to construct correspondence graphs, and implements an algorithm to find and enumerate maximum cliques in C++, that can process graphs of a few million edges on consumer hardware, with comparable performance to publicly available methods.

[2]  arXiv:2112.00006 [pdf]
Title: Towards algorithm-free physical equilibrium model of computing
Authors: Seyed Mousavi
Subjects: Computational Complexity (cs.CC); Computation and Language (cs.CL)

Our computers today, from sophisticated servers to small smartphones, operate based on the same computing model, which requires running a sequence of discrete instructions, specified as an algorithm. This sequential computing paradigm has not yet led to a fast algorithm for an NP-complete problem despite numerous attempts over the past half a century. Unfortunately, even after the introduction of quantum mechanics to the world of computing, we still followed a similar sequential paradigm, which has not yet helped us obtain such an algorithm either. Here a completely different model of computing is proposed to replace the sequential paradigm of algorithms with inherent parallelism of physical processes. Using the proposed model, instead of writing algorithms to solve NP-complete problems, we construct physical systems whose equilibrium states correspond to the desired solutions and let them evolve to search for the solutions. The main requirements of the model are identified and quantum circuits are proposed for its potential implementation.

[3]  arXiv:2112.00007 [pdf, other]
Title: Sound-Guided Semantic Image Manipulation
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the dynamic characteristics of the sources. Especially, sound can convey vivid emotions and dynamic expressions of the real world. Here, we propose a framework that directly encodes sound into the multi-modal (image-text) embedding space and manipulates an image from the space. Our audio encoder is trained to produce a latent representation from an audio input, which is forced to be aligned with image and text representations in the multi-modal embedding space. We use a direct latent optimization method based on aligned embeddings for sound-guided image manipulation. We also show that our method can mix text and audio modalities, which enrich the variety of the image modification. We verify the effectiveness of our sound-guided image manipulation quantitatively and qualitatively. We also show that our method can mix different modalities, i.e., text and audio, which enrich the variety of the image modification. The experiments on zero-shot audio classification and semantic-level image classification show that our proposed model outperforms other text and sound-guided state-of-the-art methods.

[4]  arXiv:2112.00011 [pdf, other]
Title: Predicting Poverty Level from Satellite Imagery using Deep Neural Networks
Comments: 14 pages, 5 Figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Determining the poverty levels of various regions throughout the world is crucial in identifying interventions for poverty reduction initiatives and directing resources fairly. However, reliable data on global economic livelihoods is hard to come by, especially for areas in the developing world, hampering efforts to both deploy services and monitor/evaluate progress. This is largely due to the fact that this data is obtained from traditional door-to-door surveys, which are time consuming and expensive. Overhead satellite imagery contain characteristics that make it possible to estimate the region's poverty level. In this work, I develop deep learning computer vision methods that can predict a region's poverty level from an overhead satellite image. I experiment with both daytime and nighttime imagery. Furthermore, because data limitations are often the barrier to entry in poverty prediction from satellite imagery, I explore the impact that data quantity and data augmentation have on the representational power and overall accuracy of the networks. Lastly, to evaluate the robustness of the networks, I evaluate them on data from continents that were absent in the development set.

[5]  arXiv:2112.00029 [pdf, other]
Title: Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
Subjects: Machine Learning (cs.LG)

Overparameterized neural networks generalize well but are expensive to train. Ideally, one would like to reduce their computational cost while retaining their generalization benefits. Sparse model training is a simple and promising approach to achieve this, but there remain challenges as existing methods struggle with accuracy loss, slow training runtime, or difficulty in sparsifying all model components. The core problem is that searching for a sparsity mask over a discrete set of sparse matrices is difficult and expensive. To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices. As butterfly matrices are not hardware efficient, we propose simple variants of butterfly (block and flat) to take advantage of modern hardware. Our method (Pixelated Butterfly) uses a simple fixed sparsity pattern based on flat block butterfly and low-rank matrices to sparsify most network layers (e.g., attention, MLP). We empirically validate that Pixelated Butterfly is 3x faster than butterfly and speeds up training to achieve favorable accuracy--efficiency tradeoffs. On the ImageNet classification and WikiText-103 language modeling tasks, our sparse models train up to 2.5x faster than the dense MLP-Mixer, Vision Transformer, and GPT-2 medium with no drop in accuracy.

[6]  arXiv:2112.00038 [pdf, other]
Title: Robust and Provably Monotonic Networks
Comments: 7 pages, 3 figures, accepted to Machine Learning and the Physical Sciences Workshop at the 35th Conference on Neural Information Processing Systems (NeurIPS) December 13, 2021
Subjects: Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex)

The Lipschitz constant of the map between the input and output space represented by a neural network is a natural metric for assessing the robustness of the model. We present a new method to constrain the Lipschitz constant of dense deep learning models that can also be generalized to other architectures. The method relies on a simple weight normalization scheme during training that ensures the Lipschitz constant of every layer is below an upper limit specified by the analyst. A simple residual connection can then be used to make the model monotonic in any subset of its inputs, which is useful in scenarios where domain knowledge dictates such dependence. Examples can be found in algorithmic fairness requirements or, as presented here, in the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider. Our normalization is minimally constraining and allows the underlying architecture to maintain higher expressiveness compared to other techniques which aim to either control the Lipschitz constant of the model or ensure its monotonicity. We show how the algorithm was used to train a powerful, robust, and interpretable discriminator for heavy-flavor decays in the LHCb realtime data-processing system.

[7]  arXiv:2112.00050 [pdf, other]
Title: Pattern-Aware Data Augmentation for LiDAR 3D Object Detection
Comments: Published paper in the IEEE Intelligent Transportation Systems Conference - ITSC 2021
Journal-ref: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 2703-2710
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Autonomous driving datasets are often skewed and in particular, lack training data for objects at farther distances from the ego vehicle. The imbalance of data causes a performance degradation as the distance of the detected objects increases. In this paper, we propose pattern-aware ground truth sampling, a data augmentation technique that downsamples an object's point cloud based on the LiDAR's characteristics. Specifically, we mimic the natural diverging point pattern variation that occurs for objects at depth to simulate samples at farther distances. Thus, the network has more diverse training examples and can generalize to detecting farther objects more effectively. We evaluate against existing data augmentation techniques that use point removal or perturbation methods and find that our method outperforms all of them. Additionally, we propose using equal element AP bins to evaluate the performance of 3D object detectors across distance. We improve the performance of PV-RCNN on the car class by more than 0.7 percent on the KITTI validation split at distances greater than 25 m.

[8]  arXiv:2112.00053 [pdf]
Title: Task Assignment in Distributed Systems based on PSO Approach
Comments: 8 pages, 8 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In a distributed system, Task Assignment Problem (TAP) is a key factor for obtaining efficiency. TAP illustrates the appropriate allocation of tasks to the processor of each computer. In this problem, the proposed methods up to now try to minimize Makespan and maximizing CPU utilization. Since this problem is NP-complete, many genetic algorithms have been proposed to search optimal solutions from the entire solution space. Disregarding the techniques which can reduce the complexity of optimization, the existing approaches scan the entire solution space. On the other hand, this approach is time-consuming in scheduling which is considered a shortcoming. Therefore, in this paper, a hybrid genetic algorithm has been proposed to overcome this shortcoming. Particle Swarm Optimization (PSO) has been applied as local search in the proposed genetic algorithm in this paper. The results obtained from simulation can prove that, in terms of CPU utilization and Makespan, the proposed approach outperforms the GA-based approach.

[9]  arXiv:2112.00054 [pdf, other]
Title: Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data
Comments: Pre-print
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Pre-training models on Imagenet or other massive datasets of real images has led to major advances in computer vision, albeit accompanied with shortcomings related to curation cost, privacy, usage rights, and ethical issues. In this paper, for the first time, we study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks from very different domains. In using such synthetic data for pre-training, we find that downstream performance on different tasks are favored by different configurations of simulation parameters (e.g. lighting, object pose, backgrounds, etc.), and that there is no one-size-fits-all solution. It is thus better to tailor synthetic pre-training data to a specific downstream task, for best performance. We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters to generate synthetic pre-training data for them. Task2Sim learns this mapping by training to find the set of best parameters on a set of "seen" tasks. Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot, without requiring additional training. Given a budget in number of images per class, our extensive experiments with 20 diverse downstream tasks show Task2Sim's task-adaptive pre-training data results in significantly better downstream performance than non-adaptively choosing simulation parameters on both seen and unseen tasks. It is even competitive with pre-training on real images from Imagenet.

[10]  arXiv:2112.00057 [pdf, ps, other]
Title: Successive Syndrome-Check Decoding of Polar Codes
Comments: 2021 Asilomar Conference on Signals, Systems, and Computers
Subjects: Information Theory (cs.IT)

A two-part successive syndrome-check decoding of polar codes is proposed with the first part successively refining the received codeword and the second part checking its syndrome. A new formulation of the successive-cancellation (SC) decoding algorithm is presented that allows for successively refining the received codeword by comparing the log-likelihood ratio value of a frozen bit with its predefined value. The syndrome of the refined received codeword is then checked for possible errors. In case there are no errors, the decoding process is terminated. Otherwise, the decoder continues to refine the received codeword. The proposed method is extended to the case of SC list (SCL) decoding by terminating the decoding process when the syndrome of the best candidate in the list indicates no errors. Simulation results show that the proposed method reduces the time-complexity of SC and SCL decoders and their fast variants, especially at high signal-to-noise ratios.

[11]  arXiv:2112.00059 [pdf, other]
Title: Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
Journal-ref: NeurIPS 2021
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Gradient inversion attack (or input recovery from gradient) is an emerging threat to the security and privacy preservation of Federated learning, whereby malicious eavesdroppers or participants in the protocol can recover (partially) the clients' private data. This paper evaluates existing attacks and defenses. We find that some attacks make strong assumptions about the setup. Relaxing such assumptions can substantially weaken these attacks. We then evaluate the benefits of three proposed defense mechanisms against gradient inversion attacks. We show the trade-offs of privacy leakage and data utility of these defense methods, and find that combining them in an appropriate manner makes the attack less effective, even under the original strong assumptions. We also estimate the computation cost of end-to-end recovery of a single image under each evaluated defense. Our findings suggest that the state-of-the-art attacks can currently be defended against with minor data utility loss, as summarized in a list of potential strategies. Our code is available at: https://github.com/Princeton-SysML/GradAttack.

[12]  arXiv:2112.00061 [pdf, other]
Title: Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)

Misinformation is now a major problem due to its potential high risks to our core democratic and societal values and orders. Out-of-context misinformation is one of the easiest and effective ways used by adversaries to spread viral false stories. In this threat, a real image is re-purposed to support other narratives by misrepresenting its context and/or elements. The internet is being used as the go-to way to verify information using different sources and modalities. Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-caption pairing using Web evidence. To integrate evidence and cues from both modalities, we introduce the concept of 'multi-modal cycle-consistency check'; starting from the image/caption, we gather textual/visual evidence, which will be compared against the other paired caption/image, respectively. Moreover, we propose a novel architecture, Consistency-Checking Network (CCN), that mimics the layered human reasoning across the same and different modalities: the caption vs. textual evidence, the image vs. visual evidence, and the image vs. caption. Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking, and significantly outperforms previous baselines that did not leverage external evidence.

[13]  arXiv:2112.00064 [pdf, other]
Title: Acute Tours in the Plane
Authors: Ahmad Biniaz
Subjects: Computational Geometry (cs.CG); Discrete Mathematics (cs.DM)

We confirm the following conjecture of Fekete and Woeginger from 1997: for any sufficiently large even number $n$, every set of $n$ points in the plane can be connected by a spanning tour (Hamiltonian cycle) consisting of straight-line edges such that the angle between any two consecutive edges is at most $\pi/2$. Our proof is constructive and suggests a simple $O(n\log n)$-time algorithm for finding such a tour. The previous best-known upper bound on the angle is $2\pi/3$, and it is due to Dumitrescu, Pach and T\'oth (2009).

[14]  arXiv:2112.00065 [pdf, other]
Title: Boosting EfficientNets Ensemble Performance via Pseudo-Labels and Synthetic Images by pix2pixHD for Infection and Ischaemia Classification in Diabetic Foot Ulcers
Comments: Accepted for Workshop Proceedings of the Diabetic Foot Ulcers Challenge (DFUC) as part of the 2021 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Diabetic foot ulcers are a common manifestation of lesions on the diabetic foot, a syndrome acquired as a long-term complication of diabetes mellitus. Accompanying neuropathy and vascular damage promote acquisition of pressure injuries and tissue death due to ischaemia. Affected areas are prone to infections, hindering the healing progress. The research at hand investigates an approach on classification of infection and ischaemia, conducted as part of the Diabetic Foot Ulcer Challenge (DFUC) 2021. Different models of the EfficientNet family are utilized in ensembles. An extension strategy for the training data is applied, involving pseudo-labeling for unlabeled images, and extensive generation of synthetic images via pix2pixHD to cope with severe class imbalances. The resulting extended training dataset features $8.68$ times the size of the baseline and shows a real to synthetic image ratio of $1:3$. Performances of models and ensembles trained on the baseline and extended training dataset are compared. Synthetic images featured a broad qualitative variety. Results show that models trained on the extended training dataset as well as their ensemble benefit from the large extension. F1-Scores for rare classes receive outstanding boosts, while those for common classes are either not harmed or boosted moderately. A critical discussion concretizes benefits and identifies limitations, suggesting improvements. The work concludes that classification performance of individual models as well as that of ensembles can be boosted utilizing synthetic images. Especially performance for rare classes benefits notably.

[15]  arXiv:2112.00068 [pdf, other]
Title: Scaling Shared-Memory Data Structures as Distributed Global-View Data Structures in the Partitioned Global Address Space model
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

The Partitioned Global Address Space (PGAS), a memory model in which the global address space is explicitly partitioned across compute nodes in a cluster, strives to bridge the gap between shared-memory and distributed-memory programming. To further bridge this gap, there has been an adoption of global-view distributed data structures, such as 'global arrays' or 'distributed arrays'. This work demonstrates how shared-memory data structures can be modified to scale in distributed memory. Presented in this work is the Distributed Interlocked Hash Table (DIHT), a global-view distributed map data structure inpired by the Interlocked Hash Table (IHT). At 64 nodes with 44 cores per node, DIHT provides upto 110x the performance of the Chapel standard-library HashedDist.

[16]  arXiv:2112.00071 [pdf, other]
Title: What to Learn, and How: Toward Effective Learning from Rationales
Comments: 13 pages, 8 figures
Subjects: Machine Learning (cs.LG)

Learning from rationales seeks to augment model training with human-provided rationales (i.e., a subset of input tokens) that justify those labels. While intuitive, this idea has proven elusive in practice. We make two observations about human rationales via empirical analyses: 1) maximizing predicted rationale accuracy is not necessarily the optimal objective for improving model performance; 2) human rationales vary in whether they provide sufficient information for the model to exploit for prediction, and we can use this variance to assess a dataset's potential improvement from learning from rationales. Building on these insights, we propose loss functions and learning strategies, and evaluate their effectiveness on three datasets with human rationales. Our results demonstrate consistent improvements over baselines in both label performance and rationale performance, including a 3% accuracy improvement on MultiRC. Our work highlights the importance of understanding properties of human explanations and exploiting them accordingly in model training.

[17]  arXiv:2112.00075 [pdf, other]
Title: A Multi-purposed Unsupervised Framework for Comparing Embeddings of Undirected and Directed Graphs
Comments: 32 pages, 15 figures
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)

Graph embedding is a transformation of nodes of a network into a set of vectors. A good embedding should capture the underlying graph topology and structure, node-to-node relationship, and other relevant information about the graph, its subgraphs, and nodes themselves. If these objectives are achieved, an embedding is a meaningful, understandable, and often compressed representation of a network. Unfortunately, selecting the best embedding is a challenging task and very often requires domain experts. In this paper, we extend the framework for evaluating graph embeddings that was recently introduced by the authors. Now, the framework assigns two scores, local and global, to each embedding that measure the quality of an evaluated embedding for tasks that require good representation of local and, respectively, global properties of the network. The best embedding, if needed, can be selected in an unsupervised way, or the framework can identify a few embeddings that are worth further investigation. The framework is flexible, scalable, and can deal with undirected/directed, weighted/unweighted graphs.

[18]  arXiv:2112.00076 [pdf, ps, other]
Title: Using Conversational Artificial Intelligence to Support Children's Search in the Classroom
Comments: Presented at CUI@CSCW 2021 -- this https URL
Subjects: Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

We present pathways of investigation regarding conversational user interfaces (CUIs) for children in the classroom. We highlight anticipated challenges to be addressed in order to advance knowledge on CUIs for children. Further, we discuss preliminary ideas on strategies for evaluation.

[19]  arXiv:2112.00086 [pdf, other]
Title: Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking
Comments: Code and data will be made available at project page: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

While neural language models often perform surprisingly well on natural language understanding (NLU) tasks, their strengths and limitations remain poorly understood. Controlled synthetic tasks are thus an increasingly important resource for diagnosing model behavior. In this work we focus on story understanding, a core competency for NLU systems. However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation. We develop Dyna-bAbI, a dynamic framework providing fine-grained control over task generation in bAbI. We demonstrate our ideas by constructing three new tasks requiring compositional generalization, an important evaluation setting absent from the original benchmark. We tested both special-purpose models developed for bAbI as well as state-of-the-art pre-trained methods, and found that while both approaches solve the original tasks (>99% accuracy), neither approach succeeded in the compositional generalization setting, indicating the limitations of the original training data. We explored ways to augment the original data, and found that though diversifying training data was far more useful than simply increasing dataset size, it was still insufficient for driving robust compositional generalization (with <70% accuracy for complex compositions). Our results underscore the importance of highly controllable task generators for creating robust NLU systems through a virtuous cycle of model and data development.

[20]  arXiv:2112.00087 [pdf, other]
Title: Coupling and Simulation of Fluid-Structure Interaction Problems for Automotive Sun-roof on Graphics Processing Unit
Subjects: Numerical Analysis (math.NA); Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper, the authors propose an analysis of the frequency response function in a car compartment, subject to some fluctuating pressure distribution along the open cavity of the sun-roof at the top of a car. Coupling of a computational fluid dynamics and of a computational acoustics code is considered to simulate the acoustic fluid-structure interaction problem. Iterative Krylov methods and domain decomposition methods, tuned on Graphic Processing Unit (GPU), are considered to solve the acoustic problem with complex number arithmetics with double precision. Numerical simulations illustrate the efficiency, robustness and accuracy of the proposed approaches.

[21]  arXiv:2112.00089 [pdf, other]
Title: Minimal order $H(\operatorname{div})$-conforming velocity-vorticity approximations for incompressible fluids
Subjects: Numerical Analysis (math.NA)

We introduce a novel minimal order hybrid Discontinuous Galerkin (HDG) and a novel mass conserving mixed stress (MCS) method for the approximation of incompressible flows. For this we employ the $H(\operatorname{div})$-conforming linear Brezzi-Douglas-Marini space and the lowest order Raviart-Thomas space for the approximation of the velocity and the vorticity, respectively. Our methods are based on the physically correct diffusive flux $-\nu \varepsilon(u)$ and provide exactly divergence-free discrete velocity solutions, optimal (pressure robust) error estimates and a minimal number of coupling degrees of freedom. For the stability analysis we introduce a new Korn-like inequality for vector-valued element-wise $H^1$ and normal continuous functions. Numerical examples conclude the work where the theoretical findings are validated and the novel methods are compared in terms of condition numbers with respect to discrete stability parameters.

[22]  arXiv:2112.00093 [pdf, other]
Title: "Vironment": An Art of Wearable Social Distancing
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

"Vironment" is a series of art pieces, social commentary, technology, etc., based on wearable health technologies of social-distancing, culminating in a social-distancing device that takes the familiar world of security and surveillance technologies that surround us and re-situates it on the body of the wearer (technologies that become part of us). This piece also introduces a conceptual framework for (1) the sensing of the self together with (2) sensing of others and (3) sensing of the environment around us.

[23]  arXiv:2112.00094 [pdf, other]
Title: Leveraging Intrinsic Gradient Information for Machine Learning Model Training
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Designing models that produce accurate predictions is the fundamental objective of machine learning. This work presents methods demonstrating that when the derivatives of target variables with respect to inputs can be extracted from processes of interest, they can be leveraged to improve the accuracy of differentiable machine learning models. Four key ideas are explored: (1) Improving the predictive accuracy of linear regression models and feed-forward neural networks (NNs); (2) Using the difference between the performance of feedforward NNs trained with and without gradient information to tune NN complexity (in the form of hidden node number); (3) Using gradient information to regularise linear regression; and (4) Using gradient information to improve generative image models. Across this variety of applications, gradient information is shown to enhance each predictive model, demonstrating its value for a variety of applications.

[24]  arXiv:2112.00098 [pdf, other]
Title: Connected Components for Infinite Graph Streams: Theory and Practice
Comments: 23 pages, 12 figures
Subjects: Data Structures and Algorithms (cs.DS)

Motivated by the properties of unending real-world cybersecurity streams, we present a new graph streaming model: XStream. We maintain a streaming graph and its connected components at single-edge granularity. In cybersecurity graph applications, input streams typically consist of edge insertions; individual deletions are not explicit. Analysts maintain as much history as possible and will trigger customized bulk deletions when necessary Despite a variety of dynamic graph processing systems and some canonical literature on theoretical sliding-window graph streaming, XStream is the first model explicitly designed to accommodate this usage model. Users can provide Boolean predicates to define bulk deletions. Edge arrivals are expected to occur continuously and must always be handled. XStream is implemented via a ring of finite-memory processors. We give algorithms to maintain connected components on the input stream, answer queries about connectivity, and to perform bulk deletion. The system requires bandwidth for internal messages that is some constant factor greater than the stream arrival rate. We prove a relationship among four quantities: the proportion of query downtime allowed, the proportion of edges that survive an aging event, the proportion of duplicated edges, and the bandwidth expansion factor. In addition to presenting the theory behind XStream, we present computational results for a single-threaded prototype implementation. Stream ingestion rates are bounded by computer architecture. We determine this bound for XStream inter-process message-passing rates in Intel TBB applications on Intel Sky Lake processors: between one and five million graph edges per second. Our single-threaded prototype runs our full protocols through multiple aging events at between one half and one a million edges per second, and we give ideas for speeding this up by orders of magnitude.

[25]  arXiv:2112.00100 [pdf, other]
Title: A Mathematical Framework for Evaluation of SOAR Tools with Limited Survey Data
Subjects: Human-Computer Interaction (cs.HC)

Security operation centers (SOCs) all over the world are tasked with reacting to cybersecurity alerts ranging in severity. Security Orchestration, Automation, and Response (SOAR) tools streamline cybersecurity alert responses by SOC operators. SOAR tool adoption is expensive both in effort and finances. Hence, it is crucial to limit adoption to those most worthwhile; yet no research evaluating or comparing SOAR tools exists. The goal of this work is to evaluate several SOAR tools using specific criteria pertaining to their usability. SOC operators were asked to first complete a survey about what SOAR tool aspects are most important. Operators were then assigned a set of SOAR tools for which they viewed demonstration and overview videos, and then operators completed a second survey wherein they were tasked with evaluating each of the tools on the aspects from the first survey. In addition, operators provided an overall rating to each of their assigned tools, and provided a ranking of their tools in order of preference. Due to time constraints on SOC operators for thorough testing, we provide a systematic method of downselecting a large pool of SOAR tools to a select few that merit next-step hands-on evaluation by SOC operators. Furthermore, the analyses conducted in this survey help to inform future development of SOAR tools to ensure that the appropriate functions are available for use in a SOC.

[26]  arXiv:2112.00101 [pdf, other]
Title: Fast Topological Clustering with Wasserstein Distance
Subjects: Machine Learning (cs.LG)

The topological patterns exhibited by many real-world networks motivate the development of topology-based methods for assessing the similarity of networks. However, extracting topological structure is difficult, especially for large and dense networks whose node degrees range over multiple orders of magnitude. In this paper, we propose a novel and computationally practical topological clustering method that clusters complex networks with intricate topology using principled theory from persistent homology and optimal transport. Such networks are aggregated into clusters through a centroid-based clustering strategy based on both their topological and geometric structure, preserving correspondence between nodes in different networks. The notions of topological proximity and centroid are characterized using a novel and efficient approach to computation of the Wasserstein distance and barycenter for persistence barcodes associated with connected components and cycles. The proposed method is demonstrated to be effective using both simulated networks and measured functional brain networks.

[27]  arXiv:2112.00107 [pdf]
Title: LGBTQ Privacy Concerns on Social Media
Comments: Workshop at 2018 CHI conference on human factors in computing systems: Exploring Individual Differences in Privacy
Subjects: Human-Computer Interaction (cs.HC); Cryptography and Security (cs.CR); Computers and Society (cs.CY)

We conducted semi-structured interviews with members of the LGBTQ community about their privacy practices and concerns on social networking sites. Participants used different social media sites for different needs and adapted to not being completely out on each site. We would value the opportunity to discuss the unique privacy and security needs of this population with workshop participants and learn more about the privacy needs of other marginalized user groups from researchers who have worked in those communities.

[28]  arXiv:2112.00113 [pdf, other]
Title: Beyond Flatland: Pre-training with a Strong 3D Inductive Bias
Comments: NeurIPS 2021 pre-registration workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Pre-training on large-scale databases consisting of natural images and then fine-tuning them to fit the application at hand, or transfer-learning, is a popular strategy in computer vision. However, Kataoka et al., 2020 introduced a technique to eliminate the need for natural images in supervised deep learning by proposing a novel synthetic, formula-based method to generate 2D fractals as training corpus. Using one synthetically generated fractal for each class, they achieved transfer learning results comparable to models pre-trained on natural images. In this project, we take inspiration from their work and build on this idea -- using 3D procedural object renders. Since the image formation process in the natural world is based on its 3D structure, we expect pre-training with 3D mesh renders to provide an implicit bias leading to better generalization capabilities in a transfer learning setting and that invariances to 3D rotation and illumination are easier to be learned based on 3D data. Similar to the previous work, our training corpus will be fully synthetic and derived from simple procedural strategies; we will go beyond classic data augmentation and also vary illumination and pose which are controllable in our setting and study their effect on transfer learning capabilities in context to prior work. In addition, we will compare the 2D fractal and 3D procedural object networks to human and non-human primate brain data to learn more about the 2D vs. 3D nature of biological vision.

[29]  arXiv:2112.00114 [pdf, other]
Title: Show Your Work: Scratchpads for Intermediate Computation with Language Models
Subjects: Machine Learning (cs.LG)

Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step computation, such as adding integers or executing programs. Surprisingly, we find that these same models are able to perform complex multi-step computations -- even in the few-shot regime -- when asked to perform the operation "step by step", showing the results of intermediate computations. In particular, we train transformers to perform multi-step computations by asking them to emit intermediate computation steps into a "scratchpad". On a series of increasingly complex tasks ranging from long addition to the execution of arbitrary programs, we show that scratchpads dramatically improve the ability of language models to perform multi-step computations.

[30]  arXiv:2112.00115 [pdf, other]
Title: Risk-based implementation of COLREGs for autonomous surface vehicles using deep reinforcement learning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Autonomous systems are becoming ubiquitous and gaining momentum within the marine sector. Since the electrification of transport is happening simultaneously, autonomous marine vessels can reduce environmental impact, lower costs, and increase efficiency. Although close monitoring is still required to ensure safety, the ultimate goal is full autonomy. One major milestone is to develop a control system that is versatile enough to handle any weather and encounter that is also robust and reliable. Additionally, the control system must adhere to the International Regulations for Preventing Collisions at Sea (COLREGs) for successful interaction with human sailors. Since the COLREGs were written for the human mind to interpret, they are written in ambiguous prose and therefore not machine-readable or verifiable. Due to these challenges and the wide variety of situations to be tackled, classical model-based approaches prove complicated to implement and computationally heavy. Within machine learning (ML), deep reinforcement learning (DRL) has shown great potential for a wide range of applications. The model-free and self-learning properties of DRL make it a promising candidate for autonomous vessels. In this work, a subset of the COLREGs is incorporated into a DRL-based path following and obstacle avoidance system using collision risk theory. The resulting autonomous agent dynamically interpolates between path following and COLREG-compliant collision avoidance in the training scenario, isolated encounter situations, and AIS-based simulations of real-world scenarios.

[31]  arXiv:2112.00116 [pdf, ps, other]
Title: A Review on Parallel Virtual Screening Softwares for High Performance Computers
Comments: Submitted to Pharmaceuticals, MPDI journal
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Drug discovery is the most expensive, time demanding and challenging project in biopharmaceutical companies which aims at the identification and optimization of lead compounds from large-sized chemical libraries. The lead compounds should have high affinity binding and specificity for a target associated with a disease and in addition they should have favorable pharmacodynamic and pharmacokinetic properties (grouped as ADMET properties). Overall, drug discovery is a multivariable optimization and can be carried out in supercomputers using a reliable scoring function which is a measure of binding affinity or inhibition potential of the drug-like compound. The major problem is that the number of compounds in the chemical spaces is huge making the computational drug discovery very demanding. However, it is cheaper and less time consuming when compared to experimental high throughput screening. As the problem is to find the most stable (global) minima for numerous protein-ligand complexes (at the order of 10$^6$ to 10$^{12}$), the parallel implementation of in-silico virtual screening can be exploited to make the drug discovery in affordable time. In this review, we discuss such implementations of parallelization algorithms in virtual screening programs. The nature of different scoring functions and search algorithms are discussed, together with a performance analysis of several docking softwares ported on high-performance computing architectures.

[32]  arXiv:2112.00117 [pdf, other]
Title: CIDAN: Computing in DRAM with\\Artificial Neurons
Subjects: Hardware Architecture (cs.AR)

Numerous applications such as graph processing, cryptography, databases, bioinformatics, etc., involve the repeated evaluation of Boolean functions on large bit vectors. In-memory architectures which perform processing in memory (PIM) are tailored for such applications. This paper describes a different architecture for in-memory computation called CIDAN, that achieves a 3X improvement in performance and a 2X improvement in energy for a representative set of algorithms over the state-of-the-art in-memory architectures. CIDAN uses a new basic processing element called a TLPE, which comprises a threshold logic gate (TLG) (a.k.a artificial neuron or perceptron). The implementation of a TLG within a TLPE is equivalent to a multi-input, edge-triggered flipflop that computes a subset of threshold functions of its inputs. The specific threshold function is selected on each cycle by enabling/disabling a subset of the weights associated with the threshold function, by using logic signals. In addition to the TLG, a TLPE realizes some non-threshold functions by a sequence of TLG evaluations. An equivalent CMOS implementation of a TLPE requires a substantially higher area and power. CIDAN has an array of TLPE(s) that is integrated with a DRAM, to allow fast evaluation of any one of its set of functions on large bit vectors. Results of running several common in-memory applications in graph processing and cryptography are presented.

[33]  arXiv:2112.00124 [pdf]
Title: CryoCiM: Cryogenic Compute-in-Memory based on the Quantum Anomalous Hall Effect
Comments: 10 pages, 4 figures
Subjects: Emerging Technologies (cs.ET)

The scaling of the matured CMOS technology is steadily approaching its physical limit, motivating the quest for a suitable alternative. Cryogenic operation offers a promising pathway towards continued improvement in computing speed and energy efficiency without aggressive scaling. However, the memory wall bottleneck of the traditional von-Neumann architecture persists even at low temperatures. That is where the compute-in-memory (CiM) architectures, that embed computing within the memory unit, come into play. Computations within the memory unit help reduce the expensive data transfer between the memory and the computing units. Therefore, CiM provides extreme energy efficiency that can enable lower cooling costs at cryogenic temperatures. In this work, we demonstrate CryoCiM (a cryogenic compute-in-memory framework that can perform memory read and vector bitwise operations for logical NAND, NOR, and XOR operations) with a non-volatile cryogenic memory system based on the quantum anomalous Hall effect (QAHE). The utilization of a QAHE-based memory system leads to better energy efficiency, robustness against process variations, and more scalable CiM architectures. The proposed QAHE-based CiM should enable large-scale cryogenic systems that demonstrate excellent energy efficiency.

[34]  arXiv:2112.00126 [pdf, other]
Title: Martingale product estimators for sensitivity analysis in computational statistical physics
Comments: 34 pages, 4 figures
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph); Probability (math.PR)

We introduce a new class of estimators for the linear response of steady states of stochastic dynamics. We generalize the likelihood ratio approach and formulate the linear response as a product of two martingales, hence the name "martingale product estimators". We present a systematic derivation of the martingale product estimator, and show how to construct such estimator so its bias is consistent with the weak order of the numerical scheme that approximates the underlying stochastic differential equation. Motivated by the estimation of transport properties in molecular systems, we present a rigorous numerical analysis of the bias and variance for these new estimators in the case of Langevin dynamics. We prove that the variance is uniformly bounded in time and derive a specific form of the estimator for second-order splitting schemes for Langevin dynamics. For comparison, we also study the bias and variance of a Green-Kubo estimator, motivated, in part, by its variance growing linearly in time. Presented analysis shows that the new martingale product estimators, having uniformly bounded variance in time, offer a competitive alternative to the traditional Green-Kubo estimator. We compare on illustrative numerical tests the new estimators with results obtained by the Green-Kubo method.

[35]  arXiv:2112.00131 [pdf, other]
Title: CovidAlert -- A Wristwatch-based System to Alert Users from Face Touching
Comments: 17 pages, 9 figures, PervasiveHealth2021 conference
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)

Worldwide 2019 million people have been infected and 4.5 million have lost their lives in the ongoing Covid-19 pandemic. Until vaccines became widely available, precautions and safety measures like wearing masks, physical distancing, avoiding face touching were some of the primary means to curb the spread of virus. Face touching is a compulsive human begavior that can not be prevented without making a continuous consious effort, even then it is inevitable. To address this problem, we have designed a smartwatch-based solution, CovidAlert, that leverages Random Forest algorithm trained on accelerometer and gyroscope data from the smartwatch to detects hand transition to face and sends a quick haptic alert to the users. CovidALert is highly energy efficient as it employs STA/LTA algorithm as a gatekeeper to curtail the usage of Random Forest model on the watch when user is inactive. The overall accuracy of our system is 88.4% with low false negatives and false positives. We also demonstrated the system viability by implementing it on a commercial Fossil Gen 5 smartwatch.

[36]  arXiv:2112.00132 [pdf, other]
Title: Atos: A Task-Parallel GPU Dynamic Scheduling Framework for Dynamic Irregular Computations
Comments: 12 pages, 4 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We present Atos, a task-parallel GPU dynamic scheduling framework that is especially suited to dynamic irregular applications. Compared to the dominant Bulk Synchronous Parallel (BSP) frameworks, Atos exposes additional concurrency by supporting task-parallel formulations of applications with relaxed dependencies, achieving higher GPU utilization, which is particularly significant for problems with concurrency bottlenecks. Atos also offers implicit task-parallel load balancing in addition to data-parallel load balancing, providing users the flexibility to balance between them to achieve optimal performance. Finally, Atos allows users to adapt to different use cases by controlling the kernel strategy and task-parallel granularity. We demonstrate that each of these controls is important in practice. We evaluate and analyze the performance of Atos vs. BSP on three applications: breadth-first search, PageRank, and graph coloring. Atos implementations achieve geomean speedups of 3.44x, 2.1x, and 2.77x and peak speedups of 12.8x, 3.2x, and 9.08x across three case studies, compared to a state-of-the-art BSP GPU implementation. Beyond simply quantifying the speedup, we extensively analyze the reasons behind each speedup. This deeper understanding allows us to derive general guidelines for how to select the optimal Atos configuration for different applications. Finally, our analysis provides insights for future dynamic scheduling framework designs.

[37]  arXiv:2112.00133 [pdf, other]
Title: PokeBNN: A Binary Pursuit of Lightweight Accuracy
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Top-1 ImageNet optimization promotes enormous networks that may be impractical in inference settings. Binary neural networks (BNNs) have the potential to significantly lower the compute intensity but existing models suffer from low quality. To overcome this deficiency, we propose PokeConv, a binary convolution block which improves quality of BNNs by techniques such as adding multiple residual paths, and tuning the activation function. We apply it to ResNet-50 and optimize ResNet's initial convolutional layer which is hard to binarize. We name the resulting network family PokeBNN. These techniques are chosen to yield favorable improvements in both top-1 accuracy and the network's cost. In order to enable joint optimization of the cost together with accuracy, we define arithmetic computation effort (ACE), a hardware- and energy-inspired cost metric for quantized and binarized networks. We also identify a need to optimize an under-explored hyper-parameter controlling the binarization gradient approximation.
We establish a new, strong state-of-the-art (SOTA) on top-1 accuracy together with commonly-used CPU64 cost, ACE cost and network size metrics. ReActNet-Adam, the previous SOTA in BNNs, achieved a 70.5% top-1 accuracy with 7.9 ACE. A small variant of PokeBNN achieves 70.5% top-1 with 2.6 ACE, more than 3x reduction in cost; a larger PokeBNN achieves 75.6% top-1 with 7.8 ACE, more than 5% improvement in accuracy without increasing the cost. PokeBNN implementation in JAX/Flax and reproduction instructions are open sourced.

[38]  arXiv:2112.00141 [pdf, other]
Title: Solving reward-collecting problems with UAVs: a comparison of online optimization and Q-learning
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Uncrewed autonomous vehicles (UAVs) have made significant contributions to reconnaissance and surveillance missions in past US military campaigns. As the prevalence of UAVs increases, there has also been improvements in counter-UAV technology that makes it difficult for them to successfully obtain valuable intelligence within an area of interest. Hence, it has become important that modern UAVs can accomplish their missions while maximizing their chances of survival. In this work, we specifically study the problem of identifying a short path from a designated start to a goal, while collecting all rewards and avoiding adversaries that move randomly on the grid. We also provide a possible application of the framework in a military setting, that of autonomous casualty evacuation. We present a comparison of three methods to solve this problem: namely we implement a Deep Q-Learning model, an $\varepsilon$-greedy tabular Q-Learning model, and an online optimization framework. Our computational experiments, designed using simple grid-world environments with random adversaries showcase how these approaches work and compare them in terms of performance, accuracy, and computational time.

[39]  arXiv:2112.00142 [pdf, other]
Title: ZCSD: a Computational Storage Device over Zoned Namespaces (ZNS) SSDs
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

The Big Data trend is putting strain on modern storage systems, which have to support high-performance I/O accesses for the large quantities of data. With the prevalent Von Neumann computing architecture, this data is constantly moved back and forth between the computing (i.e., CPU) and storage entities (DRAM, Non-Volatile Memory NVM storage). Hence, as the data volume grows, this constant data movement between the CPU and storage devices has emerged as a key performance bottleneck. To improve the situation, researchers have advocated to leverage computational storage devices (CSDs), which offer a programmable interface to run user-defined data processing operations close to the storage without excessive data movement, thus offering performance improvements. However, despite its potential, building CSD-aware applications remains a challenging task due to the lack of exploration and experimentation with the right API and abstraction. This is due to the limited accessibility to latest CSD/NVM devices, emerging device interfaces, and closed-source software internals of the devices. To remedy the situation, in this work we present an open-source CSD prototype over emerging NVMe Zoned Namespaces (ZNS) SSDs and an interface that can be used to explore application designs for CSD/NVM storage devices. In this paper we summarize the current state of the practice with CSD devices, make a case for designing a CSD prototype with the ZNS interface and eBPF (ZCSD), and present our initial findings. The prototype is available at https://github.com/Dantali0n/qemu-csd.

[40]  arXiv:2112.00143 [pdf, other]
Title: A Comprehensive Survey on the Convergence of Vehicular Social Networks and Fog Computing
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

In recent years, the number of IoT devices has been growing fast which leads to a challenging task for managing, storing, analyzing, and making decisions about raw data from different IoT devices, especially for delay-sensitive applications. In a vehicular network (VANET) environment, the dynamic nature of vehicles makes the current open research issues even more challenging due to the frequent topology changes that can lead to disconnections between vehicles. To this end, a number of research works have been proposed in the context of cloud and fog computing over the 5G infrastructure. On the other hand, there are a variety of research proposals that aim to extend the connection time between vehicles. Vehicular Social Networks (VSNs) have been defined to decrease the burden of connection time between the vehicles. This survey paper first provides the necessary background information and definitions about fog, cloud and related paradigms such as 5G and SDN. Then, it introduces the reader to Vehicular Social Networks, the different metrics and the main differences between VSNs and Online Social Networks. Finally, this survey investigates the related works in the context of VANETs that have demonstrated different architectures to address the different issues in fog computing. Moreover, it provides a categorization of the different approaches and discusses the required metrics in the context of fog and cloud and compares them to Vehicular social networks. A comparison of the relevant related works is discussed along with new research challenges and trends in the domain of VSNs and fog computing.

[41]  arXiv:2112.00147 [pdf, other]
Title: Slicing Scheduling for Supporting Critical Traffic in Beyond 5G
Comments: The paper has been accepted at CCNC 2022
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

One of the most challenging services fifth-generation (5G) mobile network is designed to support, is the critical services in-need of very low latency, and/or high reliability. It is now clear that such critical services will also be at the core of beyond 5G (B5G) networks. While 5G radio design accommodates such supports by introducing more flexibility in timing, how efficiently those services could be scheduled over a shared network with other broadband services remains as a challenge. In this paper, we use network slicing as an enabler for network sharing and propose an optimization framework to schedule resources to critical services via puncturing technique with minimal impact on the regular broadband services. We then thoroughly examine the performance of the framework in terms of throughput and reliability through simulation.

[42]  arXiv:2112.00155 [pdf, other]
Title: Space-time hp finite elements for heat evolution in laser-based additive manufacturing
Subjects: Numerical Analysis (math.NA)

The direct numerical simulation of metal additive manufacturing processes such as laser powder bed fusion is challenging due to the vast differences in spatial and temporal scales. Classical approaches based on locally refined finite elements combined with time-stepping schemes can only address the spatial multi-scale nature and provide only limited scaling potential for massively parallel computations. We address these shortcomings in a space-time Galerkin framework where the finite element interpolation also includes the temporal direction. In this setting, we construct four-dimensional meshes that are locally refined towards the laser spot and allow for varying temporal accuracy depending on the position in space. By splitting the mesh into conforming time slabs, we recover a stepwise solution to solve the space-time problem locally in time at this slab; additionally, we can choose time-slab sizes significantly larger than classical time-stepping schemes. As a result, we believe this setting to be well suited for large-scale parallelization. In our work, we use a continuous Galerkin-Petrov formulation of the nonlinear heat equation with an apparent heat capacity model to account for the phase change. We validate our approach by computing the AMB2018-02 benchmark, where we obtain an excellent agreement with the measured melt pool shape. Using the same setup, we demonstrate the performance potential of our approach by hatching a square area with a laser path length of about one meter.

[43]  arXiv:2112.00160 [pdf, other]
Title: Towards Full-Fledged Argument Search: A Framework for Extracting and Clustering Arguments from Unstructured Text
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Argument search aims at identifying arguments in natural language texts. In the past, this task has been addressed by a combination of keyword search and argument identification on the sentence- or document-level. However, existing frameworks often address only specific components of argument search and do not address the following aspects: (1) argument-query matching: identifying arguments that frame the topic slightly differently than the actual search query; (2) argument identification: identifying arguments that consist of multiple sentences; (3) argument clustering: selecting retrieved arguments by topical aspects. In this paper, we propose a framework for addressing these shortcomings. We suggest (1) to combine the keyword search with precomputed topic clusters for argument-query matching, (2) to apply a novel approach based on sentence-level sequence-labeling for argument identification, and (3) to present aggregated arguments to users based on topic-aware argument clustering. Our experiments on several real-world debate data sets demonstrate that density-based clustering algorithms, such as HDBSCAN, are particularly suitable for argument-query matching. With our sentence-level, BiLSTM-based sequence-labeling approach we achieve a macro F1 score of 0.71. Finally, evaluating our argument clustering method indicates that a fine-grained clustering of arguments by subtopics remains challenging but is worthwhile to be explored.

[44]  arXiv:2112.00165 [pdf, other]
Title: Coordinated Multi-Robot Trajectory Tracking over Sampled Communication
Comments: 23 pages (main article: 14 pages; proofs: 9 pages); 22 figures (main article: 12 figures). Submitted to Automatica
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

In this paper, we propose an inverse-kinematics controller for a class of multi-robot systems in the scenario of sampled communication. The goal is to make a group of robots perform trajectory tracking {in a coordinated way} when the sampling time of communications is non-negligible, disrupting the theoretical convergence guarantees of standard control designs. Given a feasible desired trajectory in the configuration space, the proposed controller receives measurements from the system at sampled time instants and computes velocity references for the robots, which are tracked by a low-level controller. We propose a jointly designed feedback plus feedforward controller with provable stability and error convergence guarantees, and further show that the obtained controller is amenable of decentralized implementation. We test the proposed control strategy via numerical simulations in the scenario of cooperative aerial manipulation of a cable-suspended load using a realistic simulator (Fly-Crane). Finally, we compare our proposed decentralized controller with centralized approaches that adapt the feedback gain online through smart heuristics, and show that it achieves comparable performance.

[45]  arXiv:2112.00166 [pdf, other]
Title: TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices using Submodular Mutual Information
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks based object detectors have shown great success in a variety of domains like autonomous vehicles, biomedical imaging, etc. It is known that their success depends on a large amount of data from the domain of interest. While deep models often perform well in terms of overall accuracy, they often struggle in performance on rare yet critical data slices. For example, data slices like "motorcycle at night" or "bicycle at night" are often rare but very critical slices for self-driving applications and false negatives on such rare slices could result in ill-fated failures and accidents. Active learning (AL) is a well-known paradigm to incrementally and adaptively build training datasets with a human in the loop. However, current AL based acquisition functions are not well-equipped to tackle real-world datasets with rare slices, since they are based on uncertainty scores or global descriptors of the image. We propose TALISMAN, a novel framework for Targeted Active Learning or object detectIon with rare slices using Submodular MutuAl iNformation. Our method uses the submodular mutual information functions instantiated using features of the region of interest (RoI) to efficiently target and acquire data points with rare slices. We evaluate our framework on the standard PASCAL VOC07+12 and BDD100K, a real-world self-driving dataset. We observe that TALISMAN outperforms other methods by in terms of average precision on rare slices, and in terms of mAP.

[46]  arXiv:2112.00167 [pdf, other]
Title: MEFNet: Multi-scale Event Fusion Network for Motion Deblurring
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Traditional frame-based cameras inevitably suffer from motion blur due to long exposure times. As a kind of bio-inspired camera, the event camera records the intensity changes in an asynchronous way with high temporal resolution, providing valid image degradation information within the exposure time. In this paper, we rethink the event-based image deblurring problem and unfold it into an end-to-end two-stage image restoration network. To effectively utilize event information, we design (i) a novel symmetric cumulative event representation specifically for image deblurring, and (ii) an affine event-image fusion module applied at multiple levels of our network. We also propose an event mask gated connection between the two stages of the network so as to avoid information loss. At the dataset level, to foster event-based motion deblurring and to facilitate evaluation on challenging real-world images, we introduce the High-Quality Blur (HQBlur) dataset, captured with an event camera in an illumination-controlled optical laboratory. Our Multi-Scale Event Fusion Network (MEFNet) sets the new state of the art for motion deblurring, surpassing both the prior best-performing image-based method and all event-based methods with public implementations on the GoPro (by up to 2.38dB) and HQBlur datasets, even in extreme blurry conditions. Source code and dataset will be made publicly available.

[47]  arXiv:2112.00169 [pdf, other]
Title: 3D Photo Stylization: Learning to Generate Stylized Novel Views from a Single Image
Comments: Project page: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Visual content creation has spurred a soaring interest given its applications in mobile photography and AR / VR. Style transfer and single-image 3D photography as two representative tasks have so far evolved independently. In this paper, we make a connection between the two, and address the challenging task of 3D photo stylization - generating stylized novel views from a single image given an arbitrary style. Our key intuition is that style transfer and view synthesis have to be jointly modeled for this task. To this end, we propose a deep model that learns geometry-aware content features for stylization from a point cloud representation of the scene, resulting in high-quality stylized images that are consistent across views. Further, we introduce a novel training protocol to enable the learning using only 2D images. We demonstrate the superiority of our method via extensive qualitative and quantitative studies, and showcase key applications of our method in light of the growing demand for 3D content creation from 2D image assets.

[48]  arXiv:2112.00170 [pdf, other]
Title: SAMO: Optimised Mapping of Convolutional Neural Networks to Streaming Architectures
Subjects: Hardware Architecture (cs.AR)

Toolflows that map Convolutional Neural Network (CNN) models to Field Programmable Gate Arrays (FPGAs) have been an important tool in accelerating a range of applications across different deployment settings. However, the significance of the problem of finding an optimal mapping is often overlooked, with the expectation that the end user will tune their generated hardware to their desired platform. This is particularly prominent within Streaming Architectures toolflows, where there is a large design space to explore. There have been many Streaming Architectures proposed, however apart from fpgaConvNet, there is limited support for optimisation methods that explore both performance objectives and platform constraints. In this work, we establish a framework, SAMO: a Streaming Architecture Mapping Optimiser, which generalises the optimisation problem of mapping Streaming Architectures to FPGA platforms. We also implement both Brute Force and Simulated Annealing optimisation methods in order to generate valid, high performance designs for a range of target platforms and CNN models. We are able to observe a 4x increase in performance compared to example designs for the popular Streaming Architecture framework FINN.

[49]  arXiv:2112.00171 [pdf, other]
Title: Improving Differentiable Architecture Search with a Generative Model
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In differentiable neural architecture search (NAS) algorithms like DARTS, the training set used to update model weight and the validation set used to update model architectures are sampled from the same data distribution. Thus, the uncommon features in the dataset fail to receive enough attention during training. In this paper, instead of introducing more complex NAS algorithms, we explore the idea that adding quality synthesized datasets into training can help the classification model identify its weakness and improve recognition accuracy. We introduce a training strategy called ``Differentiable Architecture Search with a Generative Model(DASGM)." In DASGM, the training set is used to update the classification model weight, while a synthesized dataset is used to train its architecture. The generated images have different distributions from the training set, which can help the classification model learn better features to identify its weakness. We formulate DASGM into a multi-level optimization framework and develop an effective algorithm to solve it. Experiments on CIFAR-10, CIFAR-100, and ImageNet have demonstrated the effectiveness of DASGM. Code will be made available.

[50]  arXiv:2112.00174 [pdf, other]
Title: Adaptive Optimization with Examplewise Gradients
Comments: 9 pages, 1 figure, 3 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We propose a new, more general approach to the design of stochastic gradient-based optimization methods for machine learning. In this new framework, optimizers assume access to a batch of gradient estimates per iteration, rather than a single estimate. This better reflects the information that is actually available in typical machine learning setups. To demonstrate the usefulness of this generalized approach, we develop Eve, an adaptation of the Adam optimizer which uses examplewise gradients to obtain more accurate second-moment estimates. We provide preliminary experiments, without hyperparameter tuning, which show that the new optimizer slightly outperforms Adam on a small scale benchmark and performs the same or worse on larger scale benchmarks. Further work is needed to refine the algorithm and tune hyperparameters.

[51]  arXiv:2112.00179 [html]
Title: A collection of the accepted abstracts for the Machine Learning for Health (ML4H) symposium 2021
Subjects: Machine Learning (cs.LG)

A collection of the accepted abstracts for the Machine Learning for Health (ML4H) symposium 2021. This index is not complete, as some accepted abstracts chose to opt-out of inclusion.

[52]  arXiv:2112.00180 [pdf, other]
Title: SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Recently, large pretrained models (e.g., BERT, StyleGAN, CLIP) have shown great knowledge transfer and generalization capability on various downstream tasks within their domains. Inspired by these efforts, in this paper we propose a unified model for open-domain image editing focusing on color and tone adjustment of open-domain images while keeping their original content and structure. Our model learns a unified editing space that is more semantic, intuitive, and easy to manipulate than the operation space (e.g., contrast, brightness, color curve) used in many existing photo editing softwares. Our model belongs to the image-to-image translation framework which consists of an image encoder and decoder, and is trained on pairs of before- and after-images to produce multimodal outputs. We show that by inverting image pairs into latent codes of the learned editing space, our model can be leveraged for various downstream editing tasks such as language-guided image editing, personalized editing, editing-style clustering, retrieval, etc. We extensively study the unique properties of the editing space in experiments and demonstrate superior performance on the aforementioned tasks.

[53]  arXiv:2112.00182 [pdf, other]
Title: Maliva: Using Machine Learning to Rewrite Visualization Queries Under Time Constraints
Subjects: Databases (cs.DB)

We consider data-visualization systems where data is stored in a database, and a middleware layer translates a frontend request to a SQL query to the database to compute visual results. We focus on the problem of handling visualization requests with predetermined time constraints. We study how to rewrite the original query by adding hints and/or conducting approximations so that the total time is within the time constraint. We develop a novel middleware solution called Maliva, which adopts machine learning (ML) techniques to solve the problem. It applies the Markov Decision Process (MDP) model to decide how to rewrite queries and uses training instances to learn an agent that can make a sequence of decisions judiciously for an online request. Our experiments on both real and synthetic datasets show that compared to the baseline approach that relies on the original SQL query, Maliva performs significantly better in terms of both the chance of serving requests interactively and query execution time.

[54]  arXiv:2112.00185 [pdf, other]
Title: Light Field Implicit Representation for Flexible Resolution Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Inspired by the recent advances in implicitly representing signals with trained neural networks, we aim to learn a continuous representation for narrow-baseline 4D light fields. We propose an implicit representation model for 4D light fields which is conditioned on a sparse set of input views. Our model is trained to output the light field values for a continuous range of query spatio-angular coordinates. Given a sparse set of input views, our scheme can super-resolve the input in both spatial and angular domains by flexible factors. consists of a feature extractor and a decoder which are trained on a dataset of light field patches. The feature extractor captures per-pixel features from the input views. These features can be resized to a desired spatial resolution and fed to the decoder along with the query coordinates. This formulation enables us to reconstruct light field views at any desired spatial and angular resolution. Additionally, our network can handle scenarios in which input views are either of low-resolution or with missing pixels. Experiments show that our method achieves state-of-the-art performance for the task of view synthesis while being computationally fast.

[55]  arXiv:2112.00189 [pdf, other]
Title: InfoPrint: Embedding Information into 3D Printed Objects
Comments: 24 pages, 18 figures
Subjects: Human-Computer Interaction (cs.HC)

We present a technique to embed information invisible to the eye inside 3D printed objects. The information is integrated in the object model, and then fabricated using off-the-shelf dual-head FDM (Fused Deposition Modeling) 3D printers. Our process does not require human intervention during or after printing with the integrated model. The information can be arbitrary symbols, such as icons, text,binary, or handwriting. To retrieve the information, we evaluate two different infrared-based imaging devices that are readily available-thermal cameras and near-infrared scanners. Based on our results, we propose design guidelines for a range of use cases to embed and extract hidden information. We demonstrate how our method can be used for different applications, such as interactive thermal displays, hidden board game tokens, tagging functional printed objects, and autographing non-fungible fabrication work.

[56]  arXiv:2112.00190 [pdf]
Title: Is the use of Deep Learning and Artificial Intelligence an appropriate means to locate debris in the ocean without harming aquatic wildlife?
Comments: reference list is added/updated; sorry for causing any inconveniences. 3681 words, 14 pages
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

With the global issue of plastic debris ever expanding, it is about time that the technology industry stepped in. This study aims to assess whether deep learning can successfully distinguish between marine life and man-made debris underwater. The aim is to find if we are safely able to clean up our oceans with Artificial Intelligence without disrupting the delicate balance of the aquatic ecosystems. The research explores the use of Convolutional Neural Networks from the perspective of protecting the ecosystem, rather than primarily collecting rubbish. We did this by building a custom-built, deep learning model, with an original database including 1,644 underwater images and used a binary classification to sort synthesised material from aquatic life. We concluded that although it is possible to safely distinguish between debris and life, further exploration with a larger database and stronger CNN structure has the potential for much more promising results.

[57]  arXiv:2112.00193 [pdf, other]
Title: Public Data-Assisted Mirror Descent for Private Model Training
Comments: 20 pages, 9 figures, 3 tables
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

We revisit the problem of using public data to improve the privacy/utility trade-offs for differentially private (DP) model training. Here, public data refers to auxiliary data sets that have no privacy concerns. We consider public data that is from the same distribution as the private training data.
For convex losses, we show that a variant of Mirror Descent provides population risk guarantees which are independent of the dimension of the model ($p$). Specifically, we apply Mirror Descent with the loss generated by the public data as the mirror map, and using DP gradients of the loss generated by the private (sensitive) data. To obtain dimension independence, we require $G_Q^2 \leq p$ public data samples, where $G_Q$ is a measure of the isotropy of the loss function. We further show that our algorithm has a natural ``noise stability'' property: If around the current iterate the public loss satisfies $\alpha_v$-strong convexity in a direction $v$, then using noisy gradients instead of the exact gradients shifts our next iterate in the direction $v$ by an amount proportional to $1/\alpha_v$ (in contrast with DP-SGD, where the shift is isotropic). Analogous results in prior works had to explicitly learn the geometry using the public data in the form of preconditioner matrices. Our method is also applicable to non-convex losses, as it does not rely on convexity assumptions to ensure DP guarantees.
We demonstrate the empirical efficacy of our algorithm by showing privacy/utility trade-offs on linear regression, deep learning benchmark datasets (WikiText-2, CIFAR-10, and EMNIST), and in federated learning (StackOverflow). We show that our algorithm not only significantly improves over traditional DP-SGD and DP-FedAvg, which do not have access to public data, but also improves over DP-SGD and DP-FedAvg on models that have been pre-trained with the public data to begin with.

[58]  arXiv:2112.00195 [pdf, other]
Title: Efficient Online Bayesian Inference for Neural Bandits
Subjects: Machine Learning (cs.LG)

In this paper we present a new algorithm for online (sequential) inference in Bayesian neural networks, and show its suitability for tackling contextual bandit problems. The key idea is to combine the extended Kalman filter (which locally linearizes the likelihood function at each time step) with a (learned or random) low-dimensional affine subspace for the parameters; the use of a subspace enables us to scale our algorithm to models with $\sim 1M$ parameters. While most other neural bandit methods need to store the entire past dataset in order to avoid the problem of "catastrophic forgetting", our approach uses constant memory. This is possible because we represent uncertainty about all the parameters in the model, not just the final linear layer. We show good results on the "Deep Bayesian Bandit Showdown" benchmark, as well as MNIST and a recommender system.

[59]  arXiv:2112.00200 [pdf]
Title: Efficient Big Text Data Clustering Algorithms using Hadoop and Spark
Comments: 9 pages
Journal-ref: International Journal of Computer Applications (IJCA), Vol. 174, No. 15, pp. 13-21, 2021
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Document clustering is a traditional, efficient and yet quite effective, text mining technique when we need to get a better insight of the documents of a collection that could be grouped together. The K-Means algorithm and the Hierarchical Agglomerative Clustering (HAC) algorithm are two of the most known and commonly used clustering algorithms; the former due to its low time cost and the latter due to its accuracy. However, even the use of K-Means in text clustering over large-scale collections can lead to unacceptable time costs. In this paper we first address some of the most valuable approaches for document clustering over such 'big data' (large-scale) collections. We then present two very promising alternatives: (a) a variation of an existing K-Means-based fast clustering technique (known as BigKClustering - BKC) so that it can be applied in document clustering, and (b) a hybrid clustering approach based on a customized version of the Buckshot algorithm, which first applies a hierarchical clustering procedure on a sample of the input dataset and then it uses the results as the initial centers for a K-Means based assignment of the rest of the documents, with very few iterations. We also give highly efficient adaptations of the proposed techniques in the MapReduce model which are then experimentally tested using Apache Hadoop and Spark over a real cluster environment. As it comes out of the experiments, they both lead to acceptable clustering quality as well as to significant time improvements (compared to K-Means - especially the Buckshot-based algorithm), thus constituting very promising alternatives for big document collections.

[60]  arXiv:2112.00202 [pdf, other]
Title: 3DVNet: Multi-View Depth Prediction and Volumetric Refinement
Comments: 10 pages, 6 figures, 3 tables. Accepted to 3DV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings.

[61]  arXiv:2112.00206 [pdf, other]
Title: Querying Labelled Data with Scenario Programs for Sim-to-Real Validation
Comments: pre-print
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Programming Languages (cs.PL); Robotics (cs.RO)

Simulation-based testing of autonomous vehicles (AVs) has become an essential complement to road testing to ensure safety. Consequently, substantial research has focused on searching for failure scenarios in simulation. However, a fundamental question remains: are AV failure scenarios identified in simulation meaningful in reality, i.e., are they reproducible on the real system? Due to the sim-to-real gap arising from discrepancies between simulated and real sensor data, a failure scenario identified in simulation can be either a spurious artifact of the synthetic sensor data or an actual failure that persists with real sensor data. An approach to validate simulated failure scenarios is to identify instances of the scenario in a corpus of real data, and check if the failure persists on the real data. To this end, we propose a formal definition of what it means for a labelled data item to match an abstract scenario, encoded as a scenario program using the SCENIC probabilistic programming language. Using this definition, we develop a querying algorithm which, given a scenario program and a labelled dataset, finds the subset of data matching the scenario. Experiments demonstrate that our algorithm is accurate and efficient on a variety of realistic traffic scenarios, and scales to a reasonable number of agents.

[62]  arXiv:2112.00207 [pdf]
Title: Improved sparse PCA method for face and image recognition
Comments: 11 pages. arXiv admin note: substantial text overlap with arXiv:1904.08496
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Face recognition is the very significant field in pattern recognition area. It has multiple applications in military and finance, to name a few. In this paper, the combination of the sparse PCA with the nearest-neighbor method (and with the kernel ridge regression method) will be proposed and will be applied to solve the face recognition problem. Experimental results illustrate that the accuracy of the combination of the sparse PCA method (using the proximal gradient method and the FISTA method) and one specific classification system may be lower than the accuracy of the combination of the PCA method and one specific classification system but sometimes the combination of the sparse PCA method (using the proximal gradient method or the FISTA method) and one specific classification system leads to better accuracy. Moreover, we recognize that the process computing the sparse PCA algorithm using the FISTA method is always faster than the process computing the sparse PCA algorithm using the proximal gradient method.

[63]  arXiv:2112.00209 [pdf, ps, other]
Title: Environmental Sound Extraction Using Onomatopoeia
Comments: Submitted to ICASSP2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Onomatopoeia, which is a character sequence that phonetically imitates a sound, is effective in expressing characteristics of sound such as duration, pitch, and timbre. We propose an environmental-sound-extraction method using onomatopoeia to specify the target sound to be extracted. With this method, we estimate a time-frequency mask from an input mixture spectrogram and onomatopoeia by using U-Net architecture then extract the corresponding target sound by masking the spectrogram. Experimental results indicate that the proposed method can extract only the target sound corresponding to onomatopoeia and performs better than conventional methods that use sound-event classes to specify the target sound.

[64]  arXiv:2112.00216 [pdf, other]
Title: PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Reconstructing the 3D pose of a person in metric scale from a single view image is a geometrically ill-posed problem. For example, we can not measure the exact distance of a person to the camera from a single view image without additional scene assumptions (e.g., known height). Existing learning based approaches circumvent this issue by reconstructing the 3D pose up to scale. However, there are many applications such as virtual telepresence, robotics, and augmented reality that require metric scale reconstruction. In this paper, we show that audio signals recorded along with an image, provide complementary information to reconstruct the metric 3D pose of the person.
The key insight is that as the audio signals traverse across the 3D space, their interactions with the body provide metric information about the body's pose. Based on this insight, we introduce a time-invariant transfer function called pose kernel -- the impulse response of audio signals induced by the body pose. The main properties of the pose kernel are that (1) its envelope highly correlates with 3D pose, (2) the time response corresponds to arrival time, indicating the metric distance to the microphone, and (3) it is invariant to changes in the scene geometry configurations. Therefore, it is readily generalizable to unseen scenes. We design a multi-stage 3D CNN that fuses audio and visual signals and learns to reconstruct 3D pose in a metric scale. We show that our multi-modal method produces accurate metric reconstruction in real world scenes, which is not possible with state-of-the-art lifting approaches including parametric mesh regression and depth regression.

[65]  arXiv:2112.00219 [pdf, other]
Title: Scalable Primitives for Generalized Sensor Fusion in Autonomous Vehicles
Comments: Presented in Machine Learning for Autonomous Driving Workshop at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia. 11 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

In autonomous driving, there has been an explosion in the use of deep neural networks for perception, prediction and planning tasks. As autonomous vehicles (AVs) move closer to production, multi-modal sensor inputs and heterogeneous vehicle fleets with different sets of sensor platforms are becoming increasingly common in the industry. However, neural network architectures typically target specific sensor platforms and are not robust to changes in input, making the problem of scaling and model deployment particularly difficult. Furthermore, most players still treat the problem of optimizing software and hardware as entirely independent problems. We propose a new end to end architecture, Generalized Sensor Fusion (GSF), which is designed in such a way that both sensor inputs and target tasks are modular and modifiable. This enables AV system designers to easily experiment with different sensor configurations and methods and opens up the ability to deploy on heterogeneous fleets using the same models that are shared across a large engineering organization. Using this system, we report experimental results where we demonstrate near-parity of an expensive high-density (HD) LiDAR sensor with a cheap low-density (LD) LiDAR plus camera setup in the 3D object detection task. This paves the way for the industry to jointly design hardware and software architectures as well as large fleets with heterogeneous configurations.

[66]  arXiv:2112.00220 [pdf, other]
Title: A generic physics-informed neural network-based framework for reliability assessment of multi-state systems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In this paper, we leverage the recent advances in physics-informed neural network (PINN) and develop a generic PINN-based framework to assess the reliability of multi-state systems (MSSs). The proposed methodology consists of two major steps. In the first step, we recast the reliability assessment of MSS as a machine learning problem using the framework of PINN. A feedforward neural network with two individual loss groups are constructed to encode the initial condition and state transitions governed by ordinary differential equations (ODEs) in MSS. Next, we tackle the problem of high imbalance in the magnitude of the back-propagated gradients in PINN from a multi-task learning perspective. Particularly, we treat each element in the loss function as an individual task, and adopt a gradient surgery approach named projecting conflicting gradients (PCGrad), where a task's gradient is projected onto the norm plane of any other task that has a conflicting gradient. The gradient projection operation significantly mitigates the detrimental effects caused by the gradient interference when training PINN, thus accelerating the convergence speed of PINN to high-precision solutions to MSS reliability assessment. With the proposed PINN-based framework, we investigate its applications for MSS reliability assessment in several different contexts in terms of time-independent or dependent state transitions and system scales varying from small to medium. The results demonstrate that the proposed PINN-based framework shows generic and remarkable performance in MSS reliability assessment, and the incorporation of PCGrad in PINN leads to substantial improvement in solution quality and convergence speed.

[67]  arXiv:2112.00227 [pdf, other]
Title: A Machine Learning Analysis of COVID-19 Mental Health Data
Comments: 29 pages
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

In late December 2019, the novel coronavirus (Sars-Cov-2) and the resulting disease COVID-19 were first identified in Wuhan China. The disease slipped through containment measures, with the first known case in the United States being identified on January 20th, 2020. In this paper, we utilize survey data from the Inter-university Consortium for Political and Social Research and apply several statistical and machine learning models and techniques such as Decision Trees, Multinomial Logistic Regression, Naive Bayes, k-Nearest Neighbors, Support Vector Machines, Neural Networks, Random Forests, Gradient Tree Boosting, XGBoost, CatBoost, LightGBM, Synthetic Minority Oversampling, and Chi-Squared Test to analyze the impacts the COVID-19 pandemic has had on the mental health of frontline workers in the United States. Through the interpretation of the many models applied to the mental health survey data, we have concluded that the most important factor in predicting the mental health decline of a frontline worker is the healthcare role the individual is in (Nurse, Emergency Room Staff, Surgeon, etc.), followed by the amount of sleep the individual has had in the last week, the amount of COVID-19 related news an individual has consumed on average in a day, the age of the worker, and the usage of alcohol and cannabis.

[68]  arXiv:2112.00228 [pdf, other]
Title: Efficient loading of reduced data ensembles produced at ORNL SNS/HFIR neutron time-of-flight facilities
Comments: 7 pages, 6 figures, 4 tables, The Second International Workshop on Big Data Reduction held with 2021 IEEE International Conference on Big Data
Subjects: Databases (cs.DB); Performance (cs.PF)

We present algorithmic improvements to the loading operations of certain reduced data ensembles produced from neutron scattering experiments at Oak Ridge National Laboratory (ORNL) facilities. Ensembles from multiple measurements are required to cover a wide range of the phase space of a sample material of interest. They are stored using the standard NeXus schema on individual HDF5 files. This makes it a scalability challenge, as the number of experiments stored increases in a single ensemble file. The present work follows up on our previous efforts on data management algorithms, to address identified input output (I/O) bottlenecks in Mantid, an open-source data analysis framework used across several neutron science facilities around the world. We reuse an in-memory binary-tree metadata index that resembles data access patterns, to provide a scalable search and extraction mechanism. In addition, several memory operations are refactored and optimized for the current common use cases, ranging most frequently from 10 to 180, and up to 360 separate measurement configurations. Results from this work show consistent speed ups in wall-clock time on the Mantid LoadMD routine, ranging from 19\% to 23\% on average, on ORNL production computing systems. The latter depends on the complexity of the targeted instrument-specific data and the system I/O and compute variability for the shared computational resources available to users of ORNL's Spallation Neutron Source (SNS) and the High Flux Isotope Reactor (HFIR) instruments. Nevertheless, we continue to highlight the need for more research to address reduction challenges as experimental data volumes, user time and processing costs increase.

[69]  arXiv:2112.00229 [pdf, other]
Title: Frequency Fitness Assignment: Optimization without a Bias for Good Solutions can be Efficient
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Combinatorics (math.CO)

A fitness assignment process transforms the features (such as the objective value) of a candidate solution to a scalar fitness, which then is the basis for selection. Under Frequency Fitness Assignment (FFA), the fitness corresponding to an objective value is its encounter frequency and is subject to minimization. FFA creates algorithms that are not biased towards better solutions and are invariant under all bijections of the objective function value. We investigate the impact of FFA on the performance of two theory-inspired, state-of-the-art EAs, the Greedy (2+1) GA and the Self-Adjusting (1+lambda,lambda)) GA. FFA improves their performance significantly on some problems that are hard for them. We empirically find that one FFA-based algorithm can solve all theory-based benchmark problems in this study, including traps, jumps, and plateaus, in polynomial time. We propose two hybrid approaches that use both direct and FFA-based optimization and find that they perform well. All FFA-based algorithms also perform better on satisfiability problems than all pure algorithm variants.

[70]  arXiv:2112.00234 [pdf, other]
Title: Benchmarking Deep Deblurring Algorithms: A Large-Scale Multi-Cause Dataset and A New Baseline Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Blur artifacts can seriously degrade the visual quality of images, and numerous deblurring methods have been proposed for specific scenarios. However, in most real-world images, blur is caused by different factors, e.g., motion and defocus. In this paper, we address how different deblurring methods perform on general types of blur. For in-depth performance evaluation, we construct a new large-scale multi-cause image deblurring dataset called (MC-Blur) including real-world and synthesized blurry images with mixed factors of blurs. The images in the proposed MC-Blur dataset are collected using different techniques: convolving Ultra-High-Definition (UHD) sharp images with large kernels, averaging sharp images captured by a 1000 fps high-speed camera, adding defocus to images, and real-world blurred images captured by various camera models. These results provide a comprehensive overview of the advantages and limitations of current deblurring methods. Further, we propose a new baseline model, level-attention deblurring network, to adapt to multiple causes of blurs. By including different weights of attention to the different levels of features, the proposed network derives more powerful features with larger weights assigned to more important levels, thereby enhancing the feature representation. Extensive experimental results on the new dataset demonstrate the effectiveness of the proposed model for the multi-cause blur scenarios.

[71]  arXiv:2112.00236 [pdf, other]
Title: VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion
Comments: 3DV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recent volumetric 3D reconstruction methods can produce very accurate results, with plausible geometry even for unobserved surfaces. However, they face an undesirable trade-off when it comes to multi-view fusion. They can fuse all available view information by global averaging, thus losing fine detail, or they can heuristically cluster views for local fusion, thus restricting their ability to consider all views jointly. Our key insight is that greater detail can be retained without restricting view diversity by learning a view-fusion function conditioned on camera pose and image content. We propose to learn this multi-view fusion using a transformer. To this end, we introduce VoRTX, an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion. Our model is occlusion-aware, leveraging the transformer architecture to predict an initial, projective scene geometry estimate. This estimate is used to avoid backprojecting image features through surfaces into occluded regions. We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods. We also demonstrate generalization without any fine-tuning, outperforming the same state-of-the-art methods on two other datasets, TUM-RGBD and ICL-NUIM.

[72]  arXiv:2112.00238 [pdf, other]
Title: Imbalanced Graph Classification via Graph-of-Graph Neural Networks
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Graph Neural Networks (GNNs) have achieved unprecedented success in learning graph representations to identify categorical labels of graphs. However, most existing graph classification problems with GNNs follow a balanced data splitting protocol, which is misaligned with many real-world scenarios in which some classes have much fewer labels than others. Directly training GNNs under this imbalanced situation may lead to uninformative representations of graphs in minority classes, and compromise the overall performance of downstream classification, which signifies the importance of developing effective GNNs for handling imbalanced graph classification. Existing methods are either tailored for non-graph structured data or designed specifically for imbalance node classification while few focus on imbalance graph classification. To this end, we introduce a novel framework, Graph-of-Graph Neural Networks (G$^2$GNN), which alleviates the graph imbalance issue by deriving extra supervision globally from neighboring graphs and locally from graphs themselves. Globally, we construct a graph of graphs (GoG) based on kernel similarity and perform GoG propagation to aggregate neighboring graph representations, which are initially obtained by node-level propagation with pooling via a GNN encoder. Locally, we employ topological augmentation via masking nodes or dropping edges to improve the model generalizability in discerning topology of unseen testing graphs. Extensive graph classification experiments conducted on seven benchmark datasets demonstrate our proposed G$^2$GNN outperforms numerous baselines by roughly 5\% in both F1-macro and F1-micro scores. The implementation of G$^2$GNN is available at \href{https://github.com/YuWVandy/G2GNN}{https://github.com/YuWVandy/G2GNN}.

[73]  arXiv:2112.00245 [pdf, other]
Title: True or False: Does the Deep Learning Model Learn to Detect Rumors?
Comments: 5 pages, 3 figures, 8 tables
Subjects: Computation and Language (cs.CL)

It is difficult for humans to distinguish the true and false of rumors, but current deep learning models can surpass humans and achieve excellent accuracy on many rumor datasets. In this paper, we investigate whether deep learning models that seem to perform well actually learn to detect rumors. We evaluate models on their generalization ability to out-of-domain examples by fine-tuning BERT-based models on five real-world datasets and evaluating against all test sets. The experimental results indicate that the generalization ability of the models on other unseen datasets are unsatisfactory, even common-sense rumors cannot be detected. Moreover, we found through experiments that models take shortcuts and learn absurd knowledge when the rumor datasets have serious data pitfalls. This means that simple modifications to the rumor text based on specific rules will lead to inconsistent model predictions. To more realistically evaluate rumor detection models, we proposed a new evaluation method called paired test (PairT), which requires models to correctly predict a pair of test samples at the same time. Furthermore, we make recommendations on how to better create rumor dataset and evaluate rumor detection model at the end of this paper.

[74]  arXiv:2112.00246 [pdf, other]
Title: AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-shot Interactions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Perceiving and interacting with 3D articulated objects, such as cabinets, doors, and faucets, pose particular challenges for future home-assistant robots performing daily tasks in human environments. Besides parsing the articulated parts and joint parameters, researchers recently advocate learning manipulation affordance over the input shape geometry which is more task-aware and geometrically fine-grained. However, taking only passive observations as inputs, these methods ignore many hidden but important kinematic constraints (e.g., joint location and limits) and dynamic factors (e.g., joint friction and restitution), therefore losing significant accuracy for test cases with such uncertainties. In this paper, we propose a novel framework, named AdaAfford, that learns to perform very few test-time interactions for quickly adapting the affordance priors to more accurate instance-specific posteriors. We conduct large-scale experiments using the PartNet-Mobility dataset and prove that our system performs better than baselines.

[75]  arXiv:2112.00247 [pdf, other]
Title: Adversarial Attacks Against Deep Generative Models on Data: A Survey
Comments: To be published in IEEE Transactions on Knowledge and Data Engineering
Subjects: Cryptography and Security (cs.CR)

Deep generative models have gained much attention given their ability to generate data for applications as varied as healthcare to financial technology to surveillance, and many more - the most popular models being generative adversarial networks and variational auto-encoders. Yet, as with all machine learning models, ever is the concern over security breaches and privacy leaks and deep generative models are no exception. These models have advanced so rapidly in recent years that work on their security is still in its infancy. In an attempt to audit the current and future threats against these models, and to provide a roadmap for defense preparations in the short term, we prepared this comprehensive and specialized survey on the security and privacy preservation of GANs and VAEs. Our focus is on the inner connection between attacks and model architectures and, more specifically, on five components of deep generative models: the training data, the latent code, the generators/decoders of GANs/ VAEs, the discriminators/encoders of GANs/ VAEs, and the generated data. For each model, component and attack, we review the current research progress and identify the key challenges. The paper concludes with a discussion of possible future attacks and research directions in the field.

[76]  arXiv:2112.00248 [pdf, other]
Title: Simulation platform for pattern recognition based on reservoir computing with memristor networks
Comments: 14 pages, 7 figures, 5 supplementary figures
Subjects: Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)

Memristive systems and devices are potentially available for implementing reservoir computing (RC) systems applied to pattern recognition. However, the computational ability of memristive RC systems depends on intertwined factors such as system architectures and physical properties of memristive elements, which complicates identifying the key factor for system performance. Here we develop a simulation platform for RC with memristor device networks, which enables testing different system designs for performance improvement. Numerical simulations show that the memristor-network-based RC systems can yield high computational performance comparable to that of state-of-the-art methods in three time series classification tasks. We demonstrate that the excellent and robust computation under device-to-device variability can be achieved by appropriately setting network structures, nonlinearity of memristors, and pre/post-processing, which increases the potential for reliable computation with unreliable component devices. Our results contribute to an establishment of a design guide for memristive reservoirs toward a realization of energy-efficient machine learning hardware.

[77]  arXiv:2112.00250 [pdf]
Title: Shallow Network Based on Depthwise Over-Parameterized Convolution for Hyperspectral Image Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Recently, convolutional neural network (CNN) techniques have gained popularity as a tool for hyperspectral image classification (HSIC). To improve the feature extraction efficiency of HSIC under the condition of limited samples, the current methods generally use deep models with plenty of layers. However, deep network models are prone to overfitting and gradient vanishing problems when samples are limited. In addition, the spatial resolution decreases severely with deeper depth, which is very detrimental to spatial edge feature extraction. Therefore, this letter proposes a shallow model for HSIC, which is called depthwise over-parameterized convolutional neural network (DOCNN). To ensure the effective extraction of the shallow model, the depthwise over-parameterized convolution (DO-Conv) kernel is introduced to extract the discriminative features. The depthwise over-parameterized Convolution kernel is composed of a standard convolution kernel and a depthwise convolution kernel, which can extract the spatial feature of the different channels individually and fuse the spatial features of the whole channels simultaneously. Moreover, to further reduce the loss of spatial edge features due to the convolution operation, a dense residual connection (DRC) structure is proposed to apply to the feature extraction part of the whole network. Experimental results obtained from three benchmark data sets show that the proposed method outperforms other state-of-the-art methods in terms of classification accuracy and computational efficiency.

[78]  arXiv:2112.00260 [pdf, other]
Title: Ranking Distance Calibration for Cross-Domain Few-Shot Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recent progress in few-shot learning promotes a more realistic cross-domain setting, where the source and target datasets are from different domains. Due to the domain gap and disjoint label spaces between source and target datasets, their shared knowledge is extremely limited. This encourages us to explore more information in the target domain rather than to overly elaborate training strategies on the source domain as in many existing methods. Hence, we start from a generic representation pre-trained by a cross-entropy loss and a conventional distance-based classifier, along with an image retrieval view, to employ a re-ranking process for calibrating a target distance matrix by discovering the reciprocal k-nearest neighbours within the task. Assuming the pre-trained representation is biased towards the source, we construct a non-linear subspace to minimise task-irrelevant features therewithin while keep more transferrable discriminative information by a hyperbolic tangent transformation. The calibrated distance in this target-aware non-linear subspace is complementary to that in the pre-trained representation. To impose such distance calibration information onto the pre-trained representation, a Kullback-Leibler divergence loss is employed to gradually guide the model towards the calibrated distance-based distribution. Extensive evaluations on eight target domains show that this target ranking calibration process can improve conventional distance-based classifiers in few-shot learning.

[79]  arXiv:2112.00262 [pdf, other]
Title: A Blockchain-Enabled Incentivised Framework for Cyber Threat Intelligence Sharing in ICS
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

In recent years Industrial Control Systems (ICS) have been targeted increasingly by sophisticated cyberattacks. Improving ICS security has drawn significant attention in the literature that emphasises the importance of Cyber Threat Intelligence (CTI) sharing in accelerating detection, mitigation, and prevention of cyberattacks. However, organisations are reluctant to exchange CTI due to fear of exposure, reputational damage, and lack of incentives. Furthermore, there has been limited discussion about the factors influencing participation in sharing CTI about ICS. The existing CTI-sharing platforms rely on centralised trusted architectures that suffer from a single point of failure and risk companies' privacy as the central node maintains CTI details. In this paper, we address the needs of organisations involved in the management and protection of ICS and present a novel framework that facilitates secure, private, and incentivised exchange of CTI related to ICS using blockchain. We propose a new blockchain-enabled framework that facilitates the secure dissemination of CTI data among multiple stakeholders in ICS. We provide the framework design, technical development and evaluate the framework's feasibility in a real-world application environment using practical use-case scenarios. Our proposed design shows a more practical and efficient framework for a CTI sharing network for ICS, including the bestowal and acknowledgment of data privacy, trust barriers, and security issues ingrained in this domain.

[80]  arXiv:2112.00263 [pdf, other]
Title: GLocal: Global Graph Reasoning and Local Structure Transfer for Person Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we focus on person image generation, namely, generating person image under various conditions, e.g., corrupted texture or different pose. To address texture occlusion and large pose misalignment in this task, previous works just use the corresponding region's style to infer the occluded area and rely on point-wise alignment to reorganize the context texture information, lacking the ability to globally correlate the region-wise style codes and preserve the local structure of the source. To tackle these problems, we present a GLocal framework to improve the occlusion-aware texture estimation by globally reasoning the style inter-correlations among different semantic regions, which can also be employed to recover the corrupted images in texture inpainting. For local structural information preservation, we further extract the local structure of the source image and regain it in the generated image via local structure transfer. We benchmark our method to fully characterize its performance on DeepFashion dataset and present extensive ablation studies that highlight the novelty of our method.

[81]  arXiv:2112.00265 [pdf, other]
Title: Training BatchNorm Only in Neural Architecture Search and Beyond
Comments: 11 pages Technical report
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

This work investigates the usage of batch normalization in neural architecture search (NAS). Specifically, Frankle et al. find that training BatchNorm only can achieve nontrivial performance. Furthermore, Chen et al. claim that training BatchNorm only can speed up the training of the one-shot NAS supernet over ten times. Critically, there is no effort to understand 1) why training BatchNorm only can find the perform-well architectures with the reduced supernet-training time, and 2) what is the difference between the train-BN-only supernet and the standard-train supernet. We begin by showing that the train-BN-only networks converge to the neural tangent kernel regime, obtain the same training dynamics as train all parameters theoretically. Our proof supports the claim to train BatchNorm only on supernet with less training time. Then, we empirically disclose that train-BN-only supernet provides an advantage on convolutions over other operators, cause unfair competition between architectures. This is due to only the convolution operator being attached with BatchNorm. Through experiments, we show that such unfairness makes the search algorithm prone to select models with convolutions. To solve this issue, we introduce fairness in the search space by placing a BatchNorm layer on every operator. However, we observe that the performance predictor in Chen et al. is inapplicable on the new search space. To this end, we propose a novel composite performance indicator to evaluate networks from three perspectives: expressivity, trainability, and uncertainty, derived from the theoretical property of BatchNorm. We demonstrate the effectiveness of our approach on multiple NAS-benchmarks (NAS-Bench101, NAS-Bench-201) and search spaces (DARTS search space and MobileNet search space).

[82]  arXiv:2112.00267 [pdf, other]
Title: CAMA: Energy and Memory Efficient Automata Processing in Content-Addressable Memories
Comments: This work has been accepted by IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022)
Subjects: Hardware Architecture (cs.AR); Formal Languages and Automata Theory (cs.FL)

Accelerating finite automata processing is critical for advancing real-time analytic in pattern matching, data mining, bioinformatics, intrusion detection, and machine learning. Recent in-memory automata accelerators leveraging SRAMs and DRAMs have shown exciting improvements over conventional digital designs. However, the bit-vector representation of state transitions used by all SOTA designs is only optimal in processing worst-case completely random patterns, while a significant amount of memory and energy is wasted in running most real-world benchmarks. We present CAMA, a Content-Addressable Memory (CAM) enabled Automata accelerator for processing homogeneous non-deterministic finite automata (NFA). A radically different state representation scheme, along with co-designed novel circuits and data encoding schemes, greatly reduces energy, memory, and chip area for most realistic NFAs. CAMA is holistically optimized with the following major contributions: (1) a 16x256 8-transistor (8T) CAM array for state matching, replacing the 256x256 6T SRAM array or two 16x256 6T SRAM banks in SOTA designs; (2) a novel encoding scheme that enables content searching within 8T SRAMs and adapts to different applications; (3) a reconfigurable and scalable architecture that improves efficiency on all tested benchmarks, without losing support for any NFA that is compatible with SOTA designs; (4) an optimization framework that automates the choice of encoding schemes and maps a given NFA to the proposed hardware. Two versions of CAMA, one optimized for energy (CAMA-E) and the other for throughput (CAMA-T), are comprehensively evaluated in a 28nm CMOS process, and across 21 real-world and synthetic benchmarks. CAMA-E achieves 2.1x, 2.8x, and 2.04x lower energy than CA, 2-stride Impala, and eAP. CAMA-T shows 2.68x, 3.87x and 2.62x higher average compute density than 2-stride Impala, CA, and eAP.

[83]  arXiv:2112.00269 [pdf, other]
Title: Unequal Opportunities in Multi-hop Referral Programs
Comments: preprint
Subjects: Social and Information Networks (cs.SI)

As modern social networks allow for faster and broader interactions with friends and acquaintances, online referral programs that promote sales through existing users are becoming increasingly popular. Because it is all too common that online networks reproduce historical structural bias, members of disadvantaged groups often benefit less from such referral opportunities. For instance, one-hop referral programs that distribute rewards only among pairs of friends or followers may offer less rewards and opportunities to minorities in networks where it was proved that their degrees is statistically smaller. Here, we examine the fairness of general referral programs, increasingly popular forms of marketing in which an existing referrer is encouraged to initiate the recruitment of new referred users over multiple hops. While this clearly expands opportunities for rewards, it remains unclear whether it helps addressing fairness concerns, or make them worse. We show, from studying 4 real-world networks and performing theoretical analysis on networks created with minority-majority affiliations and homophily, that the change of bias in multi-hop referral programs highly depends on the network structures and the referral strategies. Specifically, under three different constrained referral strategies which limit the number of referrals each person can share to a fixed number, we show that even with no explicit intention to discriminate and without access to sensitive attributes such as gender and race, certain referral strategies can still amplify the structural biases further when higher hops are allowed. Moreover, when there is no constraint on the number of referrals each person can distribute and when the effect of referral strategies is removed, we prove a precise condition under which the bias in 1-hop referral programs is amplified in higher-hop referral programs.

[84]  arXiv:2112.00270 [pdf, ps, other]
Title: An Enhanced Decoding Algorithm for Coded Compressed Sensing with Applications to Unsourced Random Access
Comments: Submitted to MDPI Sensors
Subjects: Information Theory (cs.IT)

Unsourced random access (URA) has emerged as a pragmatic framework for next-generation distributed sensor networks. Within URA, concatenated coding structures are often employed to ensure that the central base station can accurately recover the set of sent codewords during a given transmission period. Many URA algorithms employ independent inner and outer decoders, which can help reduce computational complexity at the expense of a decay in performance. In this article, an enhanced decoding algorithm is presented for a concatenated coding structure consisting of a wide range of inner codes and an outer tree-based code. It is shown that this algorithmic enhancement has the potential to simultaneously improve error performance and decrease the computational complexity of the decoder. This enhanced decoding algorithm is applied to two existing URA algorithms and the performance benefits of the algorithm are characterized. Findings are supported by numerical simulations.

[85]  arXiv:2112.00273 [pdf, other]
Title: Concurrent Transmission for Multi-Robot Coordination
Comments: Accepted in Robocom 2022 in conjunction with IEEE CCNC 2022
Subjects: Robotics (cs.RO); Networking and Internet Architecture (cs.NI)

An efficient communication mechanism forms the backbone for any multi-robot system to achieve fruitful collaboration and coordination. Limitation in the existing asynchronous transmission based strategies in fast dissemination and aggregation compels the designers to prune down such requirements as much as possible. This also restricts the possible application areas of mobile multi-robot systems. In this work, we introduce concurrent transmission based strategy as an alternative. Despite the commonly found difficulties in concurrent transmission such as microsecond level time synchronization, hardware heterogeneity, etc., we demonstrate how it can be exploited for multi-robot systems. We propose a split architecture where the two major activities - communication and computation are carried out independently and coordinate through periodic interactions. The proposed split architecture is applied on a custom build full networked control system consisting of five two-wheel differential drive mobile robots having heterogeneous architecture. We use the proposed design in a leader-follower setting for coordinated dynamic speed variation as well as the independent formation of various shapes. Experiments show a centimeter-level spatial and millisecond-level temporal accuracy while spending very low radio duty-cycling over multi-hop communication under a wide testing area.

[86]  arXiv:2112.00275 [pdf, other]
Title: Learning from Mistakes based on Class Weighting with Application to Neural Architecture Search
Authors: Jay Gala, Pengtao Xie
Subjects: Machine Learning (cs.LG)

Learning from mistakes is an effective learning approach widely used in human learning, where a learner pays greater focus on mistakes to circumvent them in the future. It aids in improving the overall learning outcomes. In this work, we aim to investigate how effectively this exceptional learning ability can be used to improve machine learning models as well. We propose a simple and effective multi-level optimization framework called learning from mistakes (LFM), inspired by mistake-driven learning to train better machine learning models. Our LFM framework consists of a formulation involving three learning stages. The primary objective is to train a model to perform effectively on target tasks by using a re-weighting technique to prevent similar mistakes in the future. In this formulation, we learn the class weights by minimizing the validation loss of the model and re-train the model with the synthetic data from the image generator weighted by class-wise performance and real data. We apply our LFM framework for differential architecture search methods on image classification datasets such as CIFAR and ImageNet, where the results demonstrate the effectiveness of our proposed strategy.

[87]  arXiv:2112.00277 [pdf, other]
Title: MeSH Term Suggestion for Systematic Review Literature Search
Comments: To be published in Australasian Document Computing Symposium 2021, Melbourne, Australia
Subjects: Information Theory (cs.IT)

High-quality medical systematic reviews require comprehensive literature searches to ensure the recommendations and outcomes are sufficiently reliable. Indeed, searching for relevant medical literature is a key phase in constructing systematic reviews and often involves domain (medical researchers) and search (information specialists) experts in developing the search queries. Queries in this context are highly complex, based on Boolean logic, include free-text terms and index terms from standardised terminologies (e.g., MeSH), and are difficult and time-consuming to build. The use of MeSH terms, in particular, has been shown to improve the quality of the search results. However, identifying the correct MeSH terms to include in a query is difficult: information experts are often unfamiliar with the MeSH database and unsure about the appropriateness of MeSH terms for a query. Naturally, the full value of the MeSH terminology is often not fully exploited.
This paper investigates methods to suggest MeSH terms based on an initial Boolean query that includes only free-text terms. These methods promise to automatically identify highly effective MeSH terms for inclusion in a systematic review query. Our study contributes an empirical evaluation of several MeSH term suggestion methods. We perform an extensive analysis of the retrieval, ranking, and refinement of MeSH term suggestions for each method and how these suggestions impact the effectiveness of Boolean queries.

[88]  arXiv:2112.00279 [pdf, other]
Title: A Barrier Pair Method for Safe Human-Robot Shared Autonomy
Comments: Accepted in Proceedings of the 60th IEEE Conference on Decision and Control
Subjects: Robotics (cs.RO); Optimization and Control (math.OC)

Shared autonomy provides a framework where a human and an automated system, such as a robot, jointly control the system's behavior, enabling an effective solution for various applications, including human-robot interaction. However, a challenging problem in shared autonomy is safety because the human input may be unknown and unpredictable, which affects the robot's safety constraints. If the human input is a force applied through physical contact with the robot, it also alters the robot's behavior to maintain safety. We address the safety issue of shared autonomy in real-time applications by proposing a two-layer control framework. In the first layer, we use the history of human input measurements to infer what the human wants the robot to do and define the robot's safety constraints according to that inference. In the second layer, we formulate a rapidly-exploring random tree of barrier pairs, with each barrier pair composed of a barrier function and a controller. Using the controllers in these barrier pairs, the robot is able to maintain its safe operation under the intervention from the human input. This proposed control framework allows the robot to assist the human while preventing them from encountering safety issues. We demonstrate the proposed control framework on a simulation of a two-linkage manipulator robot.

[89]  arXiv:2112.00281 [pdf, other]
Title: FDA-GAN: Flow-based Dual Attention GAN for Human Pose Transfer
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Human pose transfer aims at transferring the appearance of the source person to the target pose. Existing methods utilizing flow-based warping for non-rigid human image generation have achieved great success. However, they fail to preserve the appearance details in synthesized images since the spatial correlation between the source and target is not fully exploited. To this end, we propose the Flow-based Dual Attention GAN (FDA-GAN) to apply occlusion- and deformation-aware feature fusion for higher generation quality. Specifically, deformable local attention and flow similarity attention, constituting the dual attention mechanism, can derive the output features responsible for deformable- and occlusion-aware fusion, respectively. Besides, to maintain the pose and global position consistency in transferring, we design a pose normalization network for learning adaptive normalization from the target pose to the source person. Both qualitative and quantitative results show that our method outperforms state-of-the-art models in public iPER and DeepFashion datasets.

[90]  arXiv:2112.00283 [pdf, other]
Title: Wiki to Automotive: Understanding the Distribution Shift and its impact on Named Entity Recognition
Comments: 6 pages, 1 figure
Subjects: Computation and Language (cs.CL)

While transfer learning has become a ubiquitous technique used across Natural Language Processing (NLP) tasks, it is often unable to replicate the performance of pre-trained models on text of niche domains like Automotive. In this paper we aim to understand the main characteristics of the distribution shift with automotive domain text (describing technical functionalities such as Cruise Control) and attempt to explain the potential reasons for the gap in performance. We focus on performing the Named Entity Recognition (NER) task as it requires strong lexical, syntactic and semantic understanding by the model. Our experiments with 2 different encoders, namely BERT-Base-Uncased and SciBERT-Base-Scivocab-Uncased have lead to interesting findings that showed: 1) The performance of SciBERT is better than BERT when used for automotive domain, 2) Fine-tuning the language models with automotive domain text did not make significant improvements to the NER performance, 3) The distribution shift is challenging as it is characterized by lack of repeating contexts, sparseness of entities, large number of Out-Of-Vocabulary (OOV) words and class overlap due to domain specific nuances.

[91]  arXiv:2112.00284 [pdf, other]
Title: Interactive Model with Structural Loss for Language-based Abductive Reasoning
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The abductive natural language inference task ($\alpha$NLI) is proposed to infer the most plausible explanation between the cause and the event. In the $\alpha$NLI task, two observations are given, and the most plausible hypothesis is asked to pick out from the candidates. Existing methods model the relation between each candidate hypothesis separately and penalize the inference network uniformly. In this paper, we argue that it is unnecessary to distinguish the reasoning abilities among correct hypotheses; and similarly, all wrong hypotheses contribute the same when explaining the reasons of the observations. Therefore, we propose to group instead of ranking the hypotheses and design a structural loss called ``joint softmax focal loss'' in this paper. Based on the observation that the hypotheses are generally semantically related, we have designed a novel interactive language model aiming at exploiting the rich interaction among competing hypotheses. We name this new model for $\alpha$NLI: Interactive Model with Structural Loss (IMSL). The experimental results show that our IMSL has achieved the highest performance on the RoBERTa-large pretrained model, with ACC and AUC results increased by about 1\% and 5\% respectively.

[92]  arXiv:2112.00286 [pdf, other]
Title: Conflict-free Collaborative Set Sharing for Distributed Systems
Authors: Masato Takeichi
Comments: 11 pages, 3 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Programming Languages (cs.PL)

Collaborative Data Sharing is widely noticed to be essential for distributed systems. Among several proposed strategies, conflict-free techniques are considered useful for serverless concurrent systems. They aim at making shared data be consistent between peers in such a way that their local data do not become equal at once, but they arrive at the same data eventually when no updates occur in any peer. Although the Conflict-free Replicated Data Type (CRDT) approach could be used in data sharing as well, it puts restrictions on available operations so as to concurrent updates never cause conflicts. Even for sets, popular operations such as insertion and deletion are not freely used, for example. We propose a novel scheme for Conflict-free Collaborative Set Sharing that allows both insertion and deletion operations. It will provide a new synchronization method for data sharing and gives a fresh insight into designing conflict-free replicated data types. We might consider that this becomes a substitute for CRDTs.

[93]  arXiv:2112.00288 [pdf, other]
Title: Operation-based Collaborative Data Sharing for Distributed Systems
Authors: Masato Takeichi
Comments: 11 pages, 4 figures
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL)

Collaborative Data Sharing raises a fundamental issue in distributed systems. Several strategies have been proposed for making shared data consistent between peers in such a way that the shared part of their local data become equal. Most of the proposals rely on state-based semantics. But this suffers from a lack of descriptiveness in conflict-free features of synchronization required for flexible network connections. Recent applications tend to use non-permanent connection with mobile devices or allow temporary breakaways from the system, for example. To settle ourselves in conflict-free data sharing, we propose a novel scheme "Operation-based Collaborative Data Sharing" that enables conflict-free strategies for synchronization based on operational semantics.

[94]  arXiv:2112.00289 [pdf, other]
Title: Point Cloud Segmentation Using Sparse Temporal Local Attention
Comments: 8 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Point clouds are a key modality used for perception in autonomous vehicles, providing the means for a robust geometric understanding of the surrounding environment. However despite the sensor outputs from autonomous vehicles being naturally temporal in nature, there is still limited exploration of exploiting point cloud sequences for 3D seman-tic segmentation. In this paper we propose a novel Sparse Temporal Local Attention (STELA) module which aggregates intermediate features from a local neighbourhood in previous point cloud frames to provide a rich temporal context to the decoder. Using the sparse local neighbourhood enables our approach to gather features more flexibly than those which directly match point features, and more efficiently than those which perform expensive global attention over the whole point cloud frame. We achieve a competitive mIoU of 64.3% on the SemanticKitti dataset, and demonstrate significant improvement over the single-frame baseline in our ablation studies.

[95]  arXiv:2112.00290 [pdf, other]
Title: Unsupervised Statistical Learning for Die Analysis in Ancient Numismatics
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Die analysis is an essential numismatic method, and an important tool of ancient economic history. Yet, manual die studies are too labor-intensive to comprehensively study large coinages such as those of the Roman Empire. We address this problem by proposing a model for unsupervised computational die analysis, which can reduce the time investment necessary for large-scale die studies by several orders of magnitude, in many cases from years to weeks. From a computer vision viewpoint, die studies present a challenging unsupervised clustering problem, because they involve an unknown and large number of highly similar semantic classes of imbalanced sizes. We address these issues through determining dissimilarities between coin faces derived from specifically devised Gaussian process-based keypoint features in a Bayesian distance clustering framework. The efficacy of our method is demonstrated through an analysis of 1135 Roman silver coins struck between 64-66 C.E..

[96]  arXiv:2112.00291 [pdf, other]
Title: Bumblebee: A Path Towards Fully Autonomous Robotic Vine Pruning
Comments: 35 pages, 23 images
Subjects: Robotics (cs.RO)

Dormant season grapevine pruning requires skilled seasonal workers during the winter season which are becoming less available. As workers hasten to prune more vines in less time amid to the short-term seasonal hiring culture and low wages, vines are often pruned inconsistently leading to imbalanced grapevines. In addition to this, currently existing mechanical methods cannot selectively prune grapevines and manual follow-up operations are often required that further increase production cost. In this paper, we present the design and field evaluation of a rugged, and fully autonomous robot for end-to-end pruning of dormant season grapevines. The proposed design incorporates novel camera systems, a kinematically redundant manipulator, a ground robot, and novel algorithms in the perception system. The presented research prototype robot system was able to spur prune a row of vines from both sides completely in 213 sec/vine with a total pruning accuracy of 87%. Initial field tests of the autonomous system in a commercial vineyard have shown significant variability reduction in dormant season pruning when compared to mechanical pre-pruning trials. The design approach, system components, lessons learned, future enhancements as well as a brief economic analysis are described in the manuscript.

[97]  arXiv:2112.00294 [pdf]
Title: Mixed displacement-pressure-phase field framework for finite strain fracture of nearly incompressible hyperelastic materials
Subjects: Numerical Analysis (math.NA); Soft Condensed Matter (cond-mat.soft)

The favored phase field method (PFM) has encountered challenges in the finite strain fracture modeling of nearly or truly incompressible hyperelastic materials. We identified that the underlying cause lies in the innate contradiction between incompressibility and smeared crack opening. Drawing on the stiffness-degradation idea in PFM, we resolved this contradiction through loosening incompressible constraint of the damaged phase without affecting the incompressibility of intact material. By modifying the perturbed Lagrangian approach, we derived a novel mixed formulation. In numerical aspects, the finite element discretization uses the classical Q1/P0 and high-order P2/P1 schemes, respectively. To ease the mesh distortion at large strains, an adaptive mesh deletion technology is also developed. The validity and robustness of the proposed mixed framework are corroborated by four representative numerical examples. By comparing the performance of Q1/P0 and P2/P1, we conclude that the Q1/P0 formulation is a better choice for finite strain fracture in nearly incompressible cases. Moreover, the numerical examples also show that the combination of the proposed framework and methodology has vast potential in simulating complex peeling and tearing problems

[98]  arXiv:2112.00295 [pdf, other]
Title: Multiple Fusion Adaptation: A Strong Framework for Unsupervised Semantic Segmentation Adaptation
Comments: 13 pages, 2 figures, submitted to BMVC2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper challenges the cross-domain semantic segmentation task, aiming to improve the segmentation accuracy on the unlabeled target domain without incurring additional annotation. Using the pseudo-label-based unsupervised domain adaptation (UDA) pipeline, we propose a novel and effective Multiple Fusion Adaptation (MFA) method. MFA basically considers three parallel information fusion strategies, i.e., the cross-model fusion, temporal fusion and a novel online-offline pseudo label fusion. Specifically, the online-offline pseudo label fusion encourages the adaptive training to pay additional attention to difficult regions that are easily ignored by offline pseudo labels, therefore retaining more informative details. While the other two fusion strategies may look standard, MFA pays significant efforts to raise the efficiency and effectiveness for integration, and succeeds in injecting all the three strategies into a unified framework. Experiments on two widely used benchmarks, i.e., GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes, show that our method significantly improves the semantic segmentation adaptation, and sets up new state of the art (58.2% and 62.5% mIoU, respectively). The code will be available at https://github.com/KaiiZhang/MFA.

[99]  arXiv:2112.00298 [pdf, other]
Title: Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling
Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Multi-agent behavior modeling and trajectory forecasting are crucial for the safe navigation of autonomous agents in interactive scenarios. Variational Autoencoder (VAE) has been widely applied in multi-agent interaction modeling to generate diverse behavior and learn a low-dimensional representation for interacting systems. However, existing literature did not formally discuss if a VAE-based model can properly encode interaction into its latent space. In this work, we argue that one of the typical formulations of VAEs in multi-agent modeling suffers from an issue we refer to as social posterior collapse, i.e., the model is prone to ignoring historical social context when predicting the future trajectory of an agent. It could cause significant prediction errors and poor generalization performance. We analyze the reason behind this under-explored phenomenon and propose several measures to tackle it. Afterward, we implement the proposed framework and experiment on real-world datasets for multi-agent trajectory prediction. In particular, we propose a novel sparse graph attention message-passing (sparse-GAMP) layer, which helps us detect social posterior collapse in our experiments. In the experiments, we verify that social posterior collapse indeed occurs. Also, the proposed measures are effective in alleviating the issue. As a result, the model attains better generalization performance when historical social context is informative for prediction.

[100]  arXiv:2112.00299 [pdf, ps, other]
Title: STAR-RISs: A Correlated T&R Phase-Shift Model and Practical Phase-Shift Configuration Strategies
Comments: 31 pages, 9 figures, submitted to IEEE journals for possible publication
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Applied Physics (physics.app-ph)

A correlated transmission and reflection (T&R) phase-shift model is proposed for passive lossless simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs). A STAR-RIS-aided two-user downlink communication system is investigated for both orthogonal multiple access (OMA) and non-orthogonal multiple access (NOMA). To evaluate the impact of the correlated T&R phase-shift model on the communication performance, three phase-shift configuration strategies are developed, namely the primary-secondary phase-shift configuration (PS-PSC), the diversity preserving phase-shift configuration (DP-PSC), and the T/R-group phase-shift configuration (TR-PSC) strategies. Furthermore, we derive the outage probabilities for the three proposed phase-shift configuration strategies as well as for those of the random phase-shift configuration and the independent phase-shift model, which constitute performance lower and upper bounds, respectively. Then, the diversity order of each strategy is investigated based on the obtained analytical results. It is shown that the proposed DP-PSC strategy achieves full diversity order simultaneously for users located on both sides of the STAR-RIS. Moreover, power scaling laws are derived for the three proposed strategies and for the random phase-shift configuration. Numerical simulations reveal a performance gain if the users on both sides of the STAR-RIS are served by NOMA instead of OMA. Moreover, it is shown that the proposed DP-PSC strategy yields the same diversity order as achieved by STAR-RISs under the independent phase-shift model and a comparable power scaling law with only 4 dB reduction in received power.

[101]  arXiv:2112.00301 [pdf, other]
Title: Uncertainty in Criminal Justice Algorithms: simulation studies of the Pennsylvania Additive Classification Tool
Comments: 21 pages, 11 figures, 6 tables
Subjects: Computers and Society (cs.CY)

Much attention has been paid to algorithms related to sentencing, the setting of bail, parole decisions and recidivism while less attention has been paid to carceral algorithms, those algorithms used to determine an incarcerated individual's lived experience. In this paper we study one such algorithm, the Pennsylvania Additive Classification Tool (PACT) that assigns custody levels to incarcerated individuals. We analyze the PACT in ways that criminal justice algorithms are often analyzed: namely, we train an accurate machine learning model for the PACT; we study its fairness across sex, age and race; and we determine which features are most important. In addition to these conventional computations, we propose and carry out some new ways to study such algorithms. Instead of focusing on the outcomes themselves, we propose shifting our attention to the variability in the outcomes, especially because many carceral algorithms are used repeatedly and there can be a propagation of uncertainty. By carrying out several simulations of assigning custody levels, we shine light on problematic aspects of tools like the PACT.

[102]  arXiv:2112.00302 [pdf, other]
Title: Graph Convolutional Module for Temporal Action Localization in Videos
Comments: Accepted by T-PAMI
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Temporal action localization has long been researched in computer vision. Existing state-of-the-art action localization methods divide each video into multiple action units (i.e., proposals in two-stage methods and segments in one-stage methods) and then perform action recognition/regression on each of them individually, without explicitly exploiting their relations during learning. In this paper, we claim that the relations between action units play an important role in action localization, and a more powerful action detector should not only capture the local content of each action unit but also allow a wider field of view on the context related to it. To this end, we propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods, including two-stage and one-stage paradigms. To be specific, we first construct a graph, where each action unit is represented as a node and their relations between two action units as an edge. Here, we use two types of relations, one for capturing the temporal connections between different action units, and the other one for characterizing their semantic relationship. Particularly for the temporal connections in two-stage methods, we further explore two different kinds of edges, one connecting the overlapping action units and the other one connecting surrounding but disjointed units. Upon the graph we built, we then apply graph convolutional networks (GCNs) to model the relations among different action units, which is able to learn more informative representations to enhance action localization. Experimental results show that our GCM consistently improves the performance of existing action localization methods, including two-stage methods (e.g., CBR and R-C3D) and one-stage methods (e.g., D-SSAD), verifying the generality and effectiveness of our GCM.

[103]  arXiv:2112.00304 [pdf, other]
Title: Software Variants for Hardware Trojan Detection and Resilience in COTS Processors
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)

The commercial off-the-shelf (COTS) component based ecosystem provides an attractive system design paradigm due to the drastic reduction in development time and cost compared to custom solutions. However, it brings in a growing concern of trustworthiness arising from the possibility of embedded malicious logic, or hardware Trojans in COTS components. Existing trust-verification approaches are typically not applicable to COTS hardware due to the absence of golden models and the lack of observability of internal signals. In this work, we propose a novel approach for runtime Trojan detection and resilience in untrusted COTS processors through judicious modifications in software. The proposed approach does not rely on any hardware redundancy or architectural modification and hence seamlessly integrates with the COTS-based system design process. Trojan resilience is achieved through the execution of multiple functionally equivalent software variants. We have developed and implemented a solution for compiler-based automatic generation of program variants, metric-guided selection of variants, and their integration in a single executable. To evaluate the proposed approach, we first analyzed the effectiveness of program variants in avoiding the activation of a random pool of Trojans. By implementing several Trojans in an OpenRISC 1000 processor, we analyzed the detectability and resilience during Trojan activation in both single and multiple variants. We also present delay and code size overhead for the automatically generated variants for several programs and discuss future research directions to reduce the overhead.

[104]  arXiv:2112.00305 [pdf, other]
Title: Forward Operator Estimation in Generative Models with Kernel Transfer Operators
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Generative models which use explicit density modeling (e.g., variational autoencoders, flow-based generative models) involve finding a mapping from a known distribution, e.g. Gaussian, to the unknown input distribution. This often requires searching over a class of non-linear functions (e.g., representable by a deep neural network). While effective in practice, the associated runtime/memory costs can increase rapidly, usually as a function of the performance desired in an application. We propose a much cheaper (and simpler) strategy to estimate this mapping based on adapting known results in kernel transfer operators. We show that our formulation enables highly efficient distribution approximation and sampling, and offers surprisingly good empirical performance that compares favorably with powerful baselines, but with significant runtime savings. We show that the algorithm also performs well in small sample size settings (in brain imaging).

[105]  arXiv:2112.00312 [pdf, other]
Title: Experimental Validation of Multi-lane Formation Control for Connected and Automated Vehicles in Multiple Scenarios
Subjects: Systems and Control (eess.SY)

Formation control methods of connected and automated vehicles have been proposed to smoothly switch the structure of vehicular formations in different scenarios. In the previous research, simulations are often conducted to verify the performance of formation control methods. This paper presents the experimental results of multi-lane formation control for connected and automated vehicles. The coordinated formation control framework and specific methods utilized for different scenarios are introduced. The details of experimental platform and vehicle control strategy is provided. Simulations and experiments are conducted in different scenarios, and the results indicate that the formation control method is applicable to multiple traffic scenarios and able to improve formation-structure-switching efficiency compared with benchmark methods.

[106]  arXiv:2112.00317 [pdf, other]
Title: Unleashing the Potential of Unsupervised Pre-Training with Intra-Identity Regularization for Person Re-Identification
Comments: Technical report, code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Existing person re-identification (ReID) methods typically directly load the pre-trained ImageNet weights for initialization. However, as a fine-grained classification task, ReID is more challenging and exists a large domain gap between ImageNet classification. Inspired by the great success of self-supervised representation learning with contrastive objectives, in this paper, we design an Unsupervised Pre-training framework for ReID based on the contrastive learning (CL) pipeline, dubbed UP-ReID. During the pre-training, we attempt to address two critical issues for learning fine-grained ReID features: (1) the augmentations in CL pipeline may distort the discriminative clues in person images. (2) the fine-grained local features of person images are not fully-explored. Therefore, we introduce an intra-identity (I$^2$-)regularization in the UP-ReID, which is instantiated as two constraints coming from global image aspect and local patch aspect: a global consistency is enforced between augmented and original person images to increase robustness to augmentation, while an intrinsic contrastive constraint among local patches of each image is employed to fully explore the local discriminative clues. Extensive experiments on multiple popular Re-ID datasets, including PersonX, Market1501, CUHK03, and MSMT17, demonstrate that our UP-ReID pre-trained model can significantly benefit the downstream ReID fine-tuning and achieve state-of-the-art performance. Codes and models will be released to https://github.com/Frost-Yang-99/UP-ReID.

[107]  arXiv:2112.00319 [pdf, other]
Title: Object-Aware Cropping for Self-Supervised Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

A core component of the recent success of self-supervised learning is cropping data augmentation, which selects sub-regions of an image to be used as positive views in the self-supervised loss. The underlying assumption is that randomly cropped and resized regions of a given image share information about the objects of interest, which the learned representation will capture. This assumption is mostly satisfied in datasets such as ImageNet where there is a large, centered object, which is highly likely to be present in random crops of the full image. However, in other datasets such as OpenImages or COCO, which are more representative of real world uncurated data, there are typically multiple small objects in an image. In this work, we show that self-supervised learning based on the usual random cropping performs poorly on such datasets. We propose replacing one or both of the random crops with crops obtained from an object proposal algorithm. This encourages the model to learn both object and scene level semantic representations. Using this approach, which we call object-aware cropping, results in significant improvements over scene cropping on classification and object detection benchmarks. For example, on OpenImages, our approach achieves an improvement of 8.8% mAP over random scene-level cropping using MoCo-v2 based pre-training. We also show significant improvements on COCO and PASCAL-VOC object detection and segmentation tasks over the state-of-the-art self-supervised learning approaches. Our approach is efficient, simple and general, and can be used in most existing contrastive and non-contrastive self-supervised learning frameworks.

[108]  arXiv:2112.00320 [pdf, ps, other]
Title: Multistage Online Maxmin Allocation of Indivisible Entities
Authors: Siu-Wing Cheng
Subjects: Data Structures and Algorithms (cs.DS)

We consider an online allocation problem that involves a set $P$ of $n$ players and a set $E$ of $m$ indivisible entities over discrete time steps $1,2,\ldots,\tau$. At each time step $t \in [1,\tau]$, for every entity $e \in E$, there is a restriction list $L_t(e)$ that prescribes the subset of players to whom $e$ can be assigned and a non-negative value $v_t(e,p)$ of $e$ to every player $p \in P$. The sets $P$ and $E$ are fixed beforehand. The sets $L_t(\cdot)$ and values $v_t(\cdot,\cdot)$ are given in an online fashion. An allocation is a distribution of $E$ among $P$, and we are interested in the minimum total value of the entities received by a player according to the allocation. In the static case, it is NP-hard to find an optimal allocation the maximizes this minimum value. On the other hand, $\rho$-approximation algorithms have been developed for certain values of $\rho \in (0,1]$. We propose a $w$-lookahead algorithm for the multistage online maxmin allocation problem for any fixed $w \geq 1$ in which the restriction lists and values of entities may change between time steps, and there is a fixed stability reward for an entity to be assigned to the same player from one time step to the next. The objective is to maximize the sum of the minimum values and stability rewards over the time steps $1, 2, \ldots, \tau$. Our algorithm achieves a competitive ratio of $(1-c)\rho$, where $c$ is the positive root of the equation $wc^2 = \rho (w+1)(1-c)$. When $w = 1$, it is greater than $\frac{\rho}{4\rho+2} + \frac{\rho}{10}$, which improves upon the previous ratio of $\frac{\rho}{4\rho+2 - 2^{1-\tau}(2\rho+1)}$ obtained for the case of 1-lookahead.

[109]  arXiv:2112.00322 [pdf, other]
Title: FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, promising applications in robotics and augmented reality have attracted considerable attention to 3D object detection from point clouds. In this paper, we present FCAF3D - a first-in-class fully convolutional anchor-free indoor 3D object detection method. It is a simple yet effective method that uses a voxel representation of a point cloud and processes voxels with sparse convolutions. FCAF3D can handle large-scale scenes with minimal runtime through a single fully convolutional feed-forward pass. Existing 3D object detection methods make prior assumptions on the geometry of objects, and we argue that it limits their generalization ability. To get rid of any prior assumptions, we propose a novel parametrization of oriented bounding boxes that allows obtaining better results in a purely data-driven way. The proposed method achieves state-of-the-art 3D object detection results in terms of mAP@0.5 on ScanNet V2 (+4.5), SUN RGB-D (+3.5), and S3DIS (+20.5) datasets. The code and models are available at https://github.com/samsunglabs/fcaf3d.

[110]  arXiv:2112.00323 [pdf, other]
Title: Push Stricter to Decide Better: A Class-Conditional Feature Adaptive Framework for Improving Adversarial Robustness
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In response to the threat of adversarial examples, adversarial training provides an attractive option for enhancing the model robustness by training models on online-augmented adversarial examples. However, most of the existing adversarial training methods focus on improving the robust accuracy by strengthening the adversarial examples but neglecting the increasing shift between natural data and adversarial examples, leading to a dramatic decrease in natural accuracy. To maintain the trade-off between natural and robust accuracy, we alleviate the shift from the perspective of feature adaption and propose a Feature Adaptive Adversarial Training (FAAT) optimizing the class-conditional feature adaption across natural data and adversarial examples. Specifically, we propose to incorporate a class-conditional discriminator to encourage the features become (1) class-discriminative and (2) invariant to the change of adversarial attacks. The novel FAAT framework enables the trade-off between natural and robust accuracy by generating features with similar distribution across natural and adversarial data, and achieve higher overall robustness benefited from the class-discriminative feature characteristics. Experiments on various datasets demonstrate that FAAT produces more discriminative features and performs favorably against state-of-the-art methods. Codes are available at https://github.com/VisionFlow/FAAT.

[111]  arXiv:2112.00324 [pdf, ps, other]
Title: Optimizing for In-memory Deep Learning with Emerging Memory Technology
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET)

In-memory deep learning computes neural network models where they are stored, thus avoiding long distance communication between memory and computation units, resulting in considerable savings in energy and time. In-memory deep learning has already demonstrated orders of magnitude higher performance density and energy efficiency. The use of emerging memory technology promises to increase the gains in density, energy, and performance even further. However, emerging memory technology is intrinsically unstable, resulting in random fluctuations of data reads. This can translate to non-negligible accuracy loss, potentially nullifying the gains. In this paper, we propose three optimization techniques that can mathematically overcome the instability problem of emerging memory technology. They can improve the accuracy of the in-memory deep learning model while maximizing its energy efficiency. Experiments show that our solution can fully recover most models' state-of-the-art accuracy, and achieves at least an order of magnitude higher energy efficiency than the state-of-the-art.

[112]  arXiv:2112.00328 [pdf, other]
Title: A Daily Tourism Demand Prediction Framework Based on Multi-head Attention CNN: The Case of The Foreign Entrant in South Korea
Comments: Accepted to IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2021)
Subjects: Machine Learning (cs.LG)

Developing an accurate tourism forecasting model is essential for making desirable policy decisions for tourism management. Early studies on tourism management focus on discovering external factors related to tourism demand. Recent studies utilize deep learning in demand forecasting along with these external factors. They mainly use recursive neural network models such as LSTM and RNN for their frameworks. However, these models are not suitable for use in forecasting tourism demand. This is because tourism demand is strongly affected by changes in various external factors, and recursive neural network models have limitations in handling these multivariate inputs. We propose a multi-head attention CNN model (MHAC) for addressing these limitations. The MHAC uses 1D-convolutional neural network to analyze temporal patterns and the attention mechanism to reflect correlations between input variables. This model makes it possible to extract spatiotemporal characteristics from time-series data of various variables. We apply our forecasting framework to predict inbound tourist changes in South Korea by considering external factors such as politics, disease, season, and attraction of Korean culture. The performance results of extensive experiments show that our method outperforms other deep-learning-based prediction frameworks in South Korea tourism forecasting.

[113]  arXiv:2112.00330 [pdf, other]
Title: Soft-Output Joint Channel Estimation and Data Detection using Deep Unfolding
Comments: Presented at the 2021 IEEE Information Theory Workshop (ITW)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We propose a novel soft-output joint channel estimation and data detection (JED) algorithm for multiuser (MU) multiple-input multiple-output (MIMO) wireless communication systems. Our algorithm approximately solves a maximum a-posteriori JED optimization problem using deep unfolding and generates soft-output information for the transmitted bits in every iteration. The parameters of the unfolded algorithm are computed by a hyper-network that is trained with a binary cross entropy (BCE) loss. We evaluate the performance of our algorithm in a coded MU-MIMO system with 8 basestation antennas and 4 user equipments and compare it to state-of-the-art algorithms separate channel estimation from soft-output data detection. Our results demonstrate that our JED algorithm outperforms such data detectors with as few as 10 iterations.

[114]  arXiv:2112.00331 [pdf, other]
Title: Mutltimodal AI Companion for Interactive Fairytale Co-creation
Subjects: Multimedia (cs.MM)

AI fairy tale companions play an important role in early childhood education as an augmentation for parents' efforts to close the participation gap and boost kids' mental and language development. Existing systems are generally designed to provide vivid materials as unidirectional entertaining reading environments, e.g, visualizing inputting texts. However, due to the limited vocabulary of kids, these systems failed to afford effective interaction to motivate kids to write their own fairy tales. In this work, we propose AI.R Taletorium, an illustrative, immersive, and inclusive multimodal AI companion, for interactive fairy tale co-creation that actively involves kids to create fairy tales with both the AI agent and their normal peers. AI.R Taletorium consists a neural story generator and a doodler-based fairy tale visualizer. We design a character-centric bidirectional connection mechanism between the story generator and visualizer equipped with Contrastive Language Image Pretraining (CLIP), thus enabling kids to participant in the story generation process by simple sketching. Extensive experiments and user studies show that our system was able to generate and visualize meaningful and vivid fairy tales with limited training data and complete the full interaction cycle under various inputs (text, doodler) through the bidirectional connection.

[115]  arXiv:2112.00333 [pdf, ps, other]
Title: Joint Cluster Head Selection and Trajectory Planning in UAV-Aided IoT Networks by Reinforcement Learning with Sequential Model
Comments: This paper has been accepted in IEEE IoT-J
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Employing unmanned aerial vehicles (UAVs) has attracted growing interests and emerged as the state-of-the-art technology for data collection in Internet-of-Things (IoT) networks. In this paper, with the objective of minimizing the total energy consumption of the UAV-IoT system, we formulate the problem of jointly designing the UAV's trajectory and selecting cluster heads in the IoT network as a constrained combinatorial optimization problem which is classified as NP-hard and challenging to solve. We propose a novel deep reinforcement learning (DRL) with a sequential model strategy that can effectively learn the policy represented by a sequence-to-sequence neural network for the UAV's trajectory design in an unsupervised manner. Through extensive simulations, the obtained results show that the proposed DRL method can find the UAV's trajectory that requires much less energy consumption when compared to other baseline algorithms and achieves close-to-optimal performance. In addition, simulation results show that the trained model by our proposed DRL algorithm has an excellent generalization ability to larger problem sizes without the need to retrain the model.

[116]  arXiv:2112.00334 [pdf, other]
Title: VisRuler: Visual Analytics for Extracting Decision Rules from Bagged and Boosted Decision Trees
Comments: This manuscript is currently under review
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

Bagging and boosting are two popular ensemble methods in machine learning (ML) that produce many individual decision trees. Due to the inherent ensemble characteristic of these methods, they typically outperform single decision trees or other ML models in predictive performance. However, numerous decision paths are generated for each decision tree, increasing the overall complexity of the model and hindering its use in domains that require trustworthy and explainable decisions, such as finance, social care, and health care. Thus, the interpretability of bagging and boosting algorithms, such as random forests and adaptive boosting, reduces as the number of decisions rises. In this paper, we propose a visual analytics tool that aims to assist users in extracting decisions from such ML models via a thorough visual inspection workflow that includes selecting a set of robust and diverse models (originating from different ensemble learning algorithms), choosing important features according to their global contribution, and deciding which decisions are essential for global explanation (or locally, for specific cases). The outcome is a final decision based on the class agreement of several models and the explored manual decisions exported by users. Finally, we evaluate the applicability and effectiveness of VisRuler via a use case, a usage scenario, and a user study.

[117]  arXiv:2112.00336 [pdf, other]
Title: Multi-View Stereo with Transformer
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a network, referred to as MVSTR, for Multi-View Stereo (MVS). It is built upon Transformer and is capable of extracting dense features with global context and 3D consistency, which are crucial to achieving reliable matching for MVS. Specifically, to tackle the problem of the limited receptive field of existing CNN-based MVS methods, a global-context Transformer module is first proposed to explore intra-view global context. In addition, to further enable dense features to be 3D-consistent, a 3D-geometry Transformer module is built with a well-designed cross-view attention mechanism to facilitate inter-view information interaction. Experimental results show that the proposed MVSTR achieves the best overall performance on the DTU dataset and strong generalization on the Tanks & Temples benchmark dataset.

[118]  arXiv:2112.00337 [pdf, other]
Title: A Unified Benchmark for the Unknown Detection Capability of Deep Neural Networks
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep neural networks have achieved outstanding performance over various tasks, but they have a critical issue: over-confident predictions even for completely unknown samples. Many studies have been proposed to successfully filter out these unknown samples, but they only considered narrow and specific tasks, referred to as misclassification detection, open-set recognition, or out-of-distribution detection. In this work, we argue that these tasks should be treated as fundamentally an identical problem because an ideal model should possess detection capability for all those tasks. Therefore, we introduce the unknown detection task, an integration of previous individual tasks, for a rigorous examination of the detection capability of deep neural networks on a wide spectrum of unknown samples. To this end, unified benchmark datasets on different scales were constructed and the unknown detection capabilities of existing popular methods were subject to comparison. We found that Deep Ensemble consistently outperforms the other approaches in detecting unknowns; however, all methods are only successful for a specific type of unknown. The reproducible code and benchmark datasets are available at https://github.com/daintlab/unknown-detection-benchmarks .

[119]  arXiv:2112.00342 [pdf, other]
Title: Confidence Propagation Cluster: Unleash Full Potential of Object Detectors
Comments: Submitted to CVPR2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)

It has been a long history that most object detection methods obtain objects by using the non-maximum suppression (NMS) and its improved versions like Soft-NMS to remove redundant bounding boxes. We challenge those NMS-based methods from three aspects: 1) The bounding box with highest confidence value may not be the true positive having the biggest overlap with the ground-truth box. 2) Not only suppression is required for redundant boxes, but also confidence enhancement is needed for those true positives. 3) Sorting candidate boxes by confidence values is not necessary so that full parallelism is achievable.
In this paper, inspired by belief propagation (BP), we propose the Confidence Propagation Cluster (CP-Cluster) to replace NMS-based methods, which is fully parallelizable as well as better in accuracy. In CP-Cluster, we borrow the message passing mechanism from BP to penalize redundant boxes and enhance true positives simultaneously in an iterative way until convergence. We verified the effectiveness of CP-Cluster by applying it to various mainstream detectors such as FasterRCNN, SSD, FCOS, YOLOv3, YOLOv5, Centernet etc. Experiments on MS COCO show that our plug and play method, without retraining detectors, is able to steadily improve average mAP of all those state-of-the-art models with a clear margin from 0.2 to 1.9 respectively when compared with NMS-based methods. Source code is available at https://github.com/shenyi0220/CP-Cluster

[120]  arXiv:2112.00343 [pdf, other]
Title: Camera Motion Agnostic 3D Human Pose Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Although the performance of 3D human pose and shape estimation methods has improved significantly in recent years, existing approaches typically generate 3D poses defined in camera or human-centered coordinate system. This makes it difficult to estimate a person's pure pose and motion in world coordinate system for a video captured using a moving camera. To address this issue, this paper presents a camera motion agnostic approach for predicting 3D human pose and mesh defined in the world coordinate system. The core idea of the proposed approach is to estimate the difference between two adjacent global poses (i.e., global motion) that is invariant to selecting the coordinate system, instead of the global pose coupled to the camera motion. To this end, we propose a network based on bidirectional gated recurrent units (GRUs) that predicts the global motion sequence from the local pose sequence consisting of relative rotations of joints called global motion regressor (GMR). We use 3DPW and synthetic datasets, which are constructed in a moving-camera environment, for evaluation. We conduct extensive experiments and prove the effectiveness of the proposed method empirically. Code and datasets are available at https://github.com/seonghyunkim1212/GMR

[121]  arXiv:2112.00346 [pdf, other]
Title: Trusted And Confidential Program Analysis
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

We develop the concept of Trusted and Confidential Program Analysis (TCPA) which enables program certification to be used where previously there was insufficient trust. Imagine a scenario where a producer may not be trusted to certify its own software (perhaps by a foreign regulator), and the producer is unwilling to release its sources and detailed design to any external body. We present a protocol that can, using trusted computing based on encrypted sources, create certification via which all can trust the delivered object code without revealing the unencrypted sources to any party. Furthermore, we describe a realization of TCPA with trusted execution environments (TEE) that enables general and efficient computation. We have implemented the TCPA protocol in a system called TCWasm for web assembly architectures. In our evaluation with 33 benchmark cases, TCWasm managed to finish the analysis with relatively slight overheads.

[122]  arXiv:2112.00347 [pdf, other]
Title: An Open Source Software Stack for Tuning the Dynamical Behavior of Complex Power Systems
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

BlockSystems.jl and NetworkDynamics.jl are two novel software packages which facilitate highly efficient transient stability simulations of power networks. Users may specify inputs and power system design in a convenient modular and equation-based manner without compromising on speed or model detail. Written in the high-level, high-performance programming language Julia a rich open-source package ecosystem is available, which provides state-of-the-art solvers and machine learning algorithms. Motivated by the recent interest in the Nordic inertia challenge we have implemented the Nordic5 test case and tuned its control parameters by making use of the machine learning and automatic differentiation capabilities of our software stack.

[123]  arXiv:2112.00348 [pdf, other]
Title: Automatic travel pattern extraction from visa page stamps using CNN models
Comments: 13 pages, 13 figures, 4 tables, submitted for peer review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose an automated document analysis system that processes scanned visa pages and automatically extracts the travel pattern from detected stamps. The system processes the page via the following pipeline: stamp detection in the visa page; general stamp country and entry/exit recognition; Schengen area stamp country and entry/exit recognition; Schengen area stamp date extraction. For each stage of the proposed pipeline we construct neural network models. We integrated Schengen area stamp detection and date, country, entry/exit recognition models together with graphical user interface into an automatic travel pattern extraction tool, which is precise enough for practical applications.

[124]  arXiv:2112.00350 [pdf, other]
Title: Investigation of Training Label Error Impact on RNN-T
Comments: 6 pages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In this paper, we propose an approach to quantitatively analyze impacts of different training label errors to RNN-T based ASR models. The result shows deletion errors are more harmful than substitution and insertion label errors in RNN-T training data. We also examined label error impact mitigation approaches on RNN-T and found that, though all the methods mitigate the label-error-caused degradation to some extent, they could not remove the performance gap between the models trained with and without the presence of label errors. Based on the analysis results, we suggest to design data pipelines for RNN-T with higher priority on reducing deletion label errors. We also find that ensuring high-quality training labels remains important, despite of the existence of the label error mitigation approaches.

[125]  arXiv:2112.00351 [pdf, other]
Title: Energy Management of a Multi-Battery System for Renewable-Based High Power EV Charging
Comments: Submitted to the 22nd Power Systems Computation Conference (PSCC 2022)
Subjects: Systems and Control (eess.SY)

Stationary battery systems facilitate renewable-based electric vehicle fast charging with power levels above the connection capacity of distribution grids. This paper proposes heuristic energy management strategies for a novel multi-battery design that directly connects its strings to other DC components without the need of interfacing power converters. Hence, the energy management system has two degrees of control: (i) allocating strings to other DC microgrid components, in this case a photovoltaic system, two electric vehicle fast chargers, and a grid-tie inverter, and (ii) managing the energy exchange with the local distribution grid. For the grid exchange, a basic droop control is compared to an enhanced control including forecasts in the decision making. To this end, this paper evaluates results from multiple Monte Carlo simulations capturing the uncertainty of electric vehicle charging instances under varying charging frequencies. Using actual photovoltaic measurements from different months, the numerical analyses show that the enhanced control increases self-sufficiency by reducing grid exchange, and decreases the number of battery cycles. However, the enhanced control operates the battery closer to its state of charge limits, which accelerates calendar ageing.

[126]  arXiv:2112.00355 [pdf, other]
Title: Score Transformer: Generating Musical Score from Note-level Representation
Authors: Masahiro Suzuki
Comments: Accepted at ACM Multimedia Asia 2021 (MMAsia '21); Project page: this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

In this paper, we explore the tokenized representation of musical scores using the Transformer model to automatically generate musical scores. Thus far, sequence models have yielded fruitful results with note-level (MIDI-equivalent) symbolic representations of music. Although the note-level representations can comprise sufficient information to reproduce music aurally, they cannot contain adequate information to represent music visually in terms of notation. Musical scores contain various musical symbols (e.g., clef, key signature, and notes) and attributes (e.g., stem direction, beam, and tie) that enable us to visually comprehend musical content. However, automated estimation of these elements has yet to be comprehensively addressed. In this paper, we first design score token representation corresponding to the various musical elements. We then train the Transformer model to transcribe note-level representation into appropriate music notation. Evaluations of popular piano scores show that the proposed method significantly outperforms existing methods on all 12 musical aspects that were investigated. We also explore an effective notation-level token representation to work with the model and determine that our proposed representation produces the steadiest results.

[127]  arXiv:2112.00359 [pdf, other]
Title: Tool as Embodiment for Recursive Manipulation
Subjects: Robotics (cs.RO)

Humans and many animals exhibit a robust capability to manipulate diverse objects, often directly with their bodies and sometimes indirectly with tools. Such flexibility is likely enabled by the fundamental consistency in underlying physics of object manipulation such as contacts and force closures. Inspired by viewing tools as extensions of our bodies, we present Tool-As-Embodiment (TAE), a parameterization for tool-based manipulation policies that treat hand-object and tool-object interactions in the same representation space. The result is a single policy that can be applied recursively on robots to use end effectors to manipulate objects, and use objects as tools, i.e. new end-effectors, to manipulate other objects. By sharing experiences across different embodiments for grasping or pushing, our policy exhibits higher performance than if separate policies were trained. Our framework could utilize all experiences from different resolutions of tool-enabled embodiments to a single generic policy for each manipulation skill. Videos at https://sites.google.com/view/recursivemanipulation

[128]  arXiv:2112.00362 [pdf, other]
Title: Dimensionality Reduction for Categorical Data
Comments: Accepted in IEEE Transactions on Knowledge and Data Engineering. Copyright IEEE, 1969
Subjects: Machine Learning (cs.LG)

Categorical attributes are those that can take a discrete set of values, e.g., colours. This work is about compressing vectors over categorical attributes to low-dimension discrete vectors. The current hash-based methods compressing vectors over categorical attributes to low-dimension discrete vectors do not provide any guarantee on the Hamming distances between the compressed representations. Here we present FSketch to create sketches for sparse categorical data and an estimator to estimate the pairwise Hamming distances among the uncompressed data only from their sketches. We claim that these sketches can be used in the usual data mining tasks in place of the original data without compromising the quality of the task. For that, we ensure that the sketches also are categorical, sparse, and the Hamming distance estimates are reasonably precise. Both the sketch construction and the Hamming distance estimation algorithms require just a single-pass; furthermore, changes to a data point can be incorporated into its sketch in an efficient manner. The compressibility depends upon how sparse the data is and is independent of the original dimension -- making our algorithm attractive for many real-life scenarios. Our claims are backed by rigorous theoretical analysis of the properties of FSketch and supplemented by extensive comparative evaluations with related algorithms on some real-world datasets. We show that FSketch is significantly faster, and the accuracy obtained by using its sketches are among the top for the standard unsupervised tasks of RMSE, clustering and similarity search.

[129]  arXiv:2112.00364 [pdf, other]
Title: Universal Probabilistic Programming Language Compilation with Parallel Efficient Sequential Monte Carlo Inference
Subjects: Programming Languages (cs.PL)

Probabilistic programming languages (PPLs) allow for natural encoding of arbitrary inference problems, and PPL implementations can provide automatic general-purpose inference for these problems. However, constructing inference implementations that are efficient enough is challenging for many real-world problems. Often, this is due to PPLs not fully exploiting available parallelization and optimization opportunities. For example, handling of probabilistic checkpoints in PPLs through the use of continuation-passing style transformations or non-preemptive multitasking -- as is done in many popular PPLs -- often disallows compilation to low-level languages required for high-performance platforms such as graphics processing units (GPUs). As a solution to this checkpoint problem, we introduce the concept of PPL control-flow graphs (PCFGs), providing a simple and efficient approach that can be used for handling checkpoints in such languages. We use this approach to implement RootPPL: a low-level PPL built on CUDA and C++ with OpenMP, providing highly efficient and massively parallel SMC inference. We also introduce a general method of compiling universal high-level PPLs to PCFGs, and illustrate its application when compiling Miking CorePPL -- a high-level universal PPL -- to RootPPL. This is the first time a universal PPL has been compiled to GPUs with SMC inference. Both RootPPL and the CorePPL compiler are evaluated through a set of real-world experiments in the domains of phylogenetics and epidemiology, demonstrating up to 6x speedups over state-of-the-art PPLs implementing SMC inference.

[130]  arXiv:2112.00374 [pdf, other]
Title: CLIPstyler: Image Style Transfer with a Single Text Condition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Image and Video Processing (eess.IV)

Existing neural style transfer methods require reference style images to transfer texture information of style images to content images. However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer `without' a style image, but only with a text description of the desired style. Using the pre-trained text-image embedding model of CLIP, we demonstrate the modulation of the style of content images only with a single text condition. Specifically, we propose a patch-wise text-image matching loss with multiview augmentations for realistic texture transfer. Extensive experimental results confirmed the successful image style transfer with realistic textures that reflect semantic query texts.

[131]  arXiv:2112.00378 [pdf, other]
Title: $\ell_\infty$-Robustness and Beyond: Unleashing Efficient Adversarial Training
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Neural networks are vulnerable to adversarial attacks: adding well-crafted, imperceptible perturbations to their input can modify their output. Adversarial training is one of the most effective approaches in training robust models against such attacks. However, it is much slower than vanilla training of neural networks since it needs to construct adversarial examples for the entire training data at every iteration, which has hampered its effectiveness. Recently, Fast Adversarial Training was proposed that can obtain robust models efficiently. However, the reasons behind its success are not fully understood, and more importantly, it can only train robust models for $\ell_\infty$-bounded attacks as it uses FGSM during training. In this paper, by leveraging the theory of coreset selection we show how selecting a small subset of training data provides a more principled approach towards reducing the time complexity of robust training. Unlike existing methods, our approach can be adapted to a wide variety of training objectives, including TRADES, $\ell_p$-PGD, and Perceptual Adversarial Training. Our experimental results indicate that our approach speeds up adversarial training by 2-3 times, while experiencing a small reduction in the clean and robust accuracy.

[132]  arXiv:2112.00380 [pdf, other]
Title: Deep Measurement Updates for Bayes Filters
Journal-ref: IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 414-421, Jan. 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)

Measurement update rules for Bayes filters often contain hand-crafted heuristics to compute observation probabilities for high-dimensional sensor data, like images. In this work, we propose the novel approach Deep Measurement Update (DMU) as a general update rule for a wide range of systems. DMU has a conditional encoder-decoder neural network structure to process depth images as raw inputs. Even though the network is trained only on synthetic data, the model shows good performance at evaluation time on real-world data. With our proposed training scheme primed data training , we demonstrate how the DMU models can be trained efficiently to be sensitive to condition variables without having to rely on a stochastic information bottleneck. We validate the proposed methods in multiple scenarios of increasing complexity, beginning with the pose estimation of a single object to the joint estimation of the pose and the internal state of an articulated system. Moreover, we provide a benchmark against Articulated Signed Distance Functions(A-SDF) on the RBO dataset as a baseline comparison for articulation state estimation.

[133]  arXiv:2112.00382 [pdf, other]
Title: Lagrange and $H(\operatorname{curl},{\cal B})$ based Finite Element formulations for the relaxed micromorphic model
Subjects: Numerical Analysis (math.NA)

Modeling the unusual mechanical properties of metamaterials is a challenging topic for the mechanics community and enriched continuum theories are promising computational tools for such materials. The so-called relaxed micromorphic model has shown many advantages in this field. In this contribution, we present the significant aspects related to the relaxed micromorphic model realization with the finite element method. The variational problem is derived and different FEM-formulations for the two-dimensional case are presented. These are a nodal standard formulation $H^1({\cal B}) \times H^1({\cal B})$ and a nodal-edge formulation $H^1({\cal B}) \times H(\operatorname{curl}, {\cal B})$, where the latter employs the N\'ed\'elec space. However, the implementation of higher-order N\'ed\'elec elements is not trivial and requires some technicalities which are demonstrated. We discuss the convergence behavior of Lagrange-type and tangential-conforming finite element discretizations. Moreover, we analyze the characteristic length effect on the different components of the model and reveal how the size-effect property is captured via this characteristic length.

[134]  arXiv:2112.00384 [pdf, other]
Title: Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22% in text-to-image generation and +26% in image-to-text generation, outperforming the VQGAN.

[135]  arXiv:2112.00386 [pdf, other]
Title: Spurious Valleys, Spurious Minima and NP-hardness of Sparse Matrix Factorization With Fixed Support
Authors: Quoc-Tung Le (DANTE), Elisa Riccietti (DANTE), Rémi Gribonval (DANTE)
Subjects: Computational Complexity (cs.CC)

The problem of approximating a dense matrix by a product of sparse factors is a fundamental problem for many signal processing and machine learning tasks. It can be decomposed into two subproblems: finding the position of the non-zero coefficients in the sparse factors, and determining their values. While the first step is usually seen as the most challenging one due to its combinatorial nature, this paper focuses on the second step, referred to as sparse matrix approximation with fixed support. First, we show its NP-hardness, while also presenting a nontrivial family of supports making the problem practically tractable with a dedicated algorithm. Then, we investigate the landscape of its natural optimization formulation, proving the absence of spurious local valleys and spurious local minima, whose presence could prevent local optimization methods to achieve global optimality. The advantages of the proposed algorithm over state-of-the-art first-order optimization methods are discussed.

[136]  arXiv:2112.00387 [pdf, other]
Title: How Parallel Circuit Execution Can Be Useful for NISQ Computing?
Comments: 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), Mar 2022, ANTWERP, Belgium
Subjects: Hardware Architecture (cs.AR); Quantum Physics (quant-ph)

Quantum computing is performed on Noisy Intermediate-Scale Quantum (NISQ) hardware in the short term. Only small circuits can be executed reliably on a quantum machine due to the unavoidable noisy quantum operations on NISQ devices, leading to the under-utilization of hardware resources. With the growing demand to access quantum hardware, how to utilize it more efficiently while maintaining output fidelity is becoming a timely issue. A parallel circuit execution technique has been proposed to address this problem by executing multiple programs on hardware simultaneously. It can improve the hardware throughput and reduce the overall runtime. However, accumulative noises such as crosstalk can decrease the output fidelity in parallel workload execution. In this paper, we first give an in-depth overview of stateof-the-art parallel circuit execution methods. Second, we propose a Quantum Crosstalk-aware Parallel workload execution method (QuCP) without the overhead of crosstalk characterization. Third, we investigate the trade-off between hardware throughput and fidelity loss to explore the hardware limitation with parallel circuit execution. Finally, we apply parallel circuit execution to VQE and zero-noise extrapolation error mitigation method to showcase its various applications on advancing NISQ computing.

[137]  arXiv:2112.00390 [pdf, other]
Title: SegDiff: Image Segmentation with Diffusion Probabilistic Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Diffusion Probabilistic Methods are employed for state-of-the-art image generation. In this work, we present a method for extending such models for performing image segmentation. The method learns end-to-end, without relying on a pre-trained backbone. The information in the input image and in the current estimation of the segmentation map is merged by summing the output of two encoders. Additional encoding layers and a decoder are then used to iteratively refine the segmentation map using a diffusion model. Since the diffusion model is probabilistic, it is applied multiple times and the results are merged into a final segmentation map. The new method obtains state-of-the-art results on the Cityscapes validation set, the Vaihingen building segmentation benchmark, and the MoNuSeg dataset.

[138]  arXiv:2112.00394 [pdf, ps, other]
Title: Wiretap Secret Key Agreement Via Secure Omniscience
Comments: 23 pages, 7 figures. arXiv admin note: text overlap with arXiv:2102.01771
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

In this paper, we explore the connection between secret key agreement and secure omniscience within the setting of the multiterminal source model with a wiretapper who has side information. While the secret key agreement problem considers the generation of a maximum-rate secret key through public discussion, the secure omniscience problem is concerned with communication protocols for omniscience that minimize the rate of information leakage to the wiretapper. The starting point of our work is a lower bound on the minimum leakage rate for omniscience, $R_{\mathop{\mathrm{L}}}$, in terms of the wiretap secret key capacity, $C_{\mathop{\mathrm{W}}}$. Our interest is in identifying broad classes of sources for which this lower bound is met with equality, in which case we say that there is a duality between secure omniscience and secret key agreement. We show that this duality holds in the case of certain finite linear source (FLS) models, such as two-terminal FLS models and pairwise independent network models on trees with a linear wiretapper. Duality also holds for any FLS model in which $C_{\mathop{\mathrm{W}}}$ is achieved by a perfect linear secret key agreement scheme. We conjecture that the duality in fact holds unconditionally for any FLS model. On the negative side, we give an example of a (non-FLS) source model for which duality does not hold if we limit ourselves to communication-for-omniscience protocols with at most two (interactive) communications. Finally, we demonstrate the usefulness of our lower bound on $R_{\mathop{\mathrm{L}}}$ by using it to derive equivalent conditions for the positivity of $C_{\mathop{\mathrm{W}}}$ in the multiterminal model. This extends a recent result of Gohari, G\"{u}nl\"{u} and Kramer (2020) obtained for the two-user setting.

[139]  arXiv:2112.00395 [pdf]
Title: A Comprehensive Study on Various Statistical Techniques for Prediction of Movie Success
Comments: 14 pages, 12 figures Conference: 2nd International Conference on Machine Learning Techniques and Data Science (MLDS 2021)
Subjects: Machine Learning (cs.LG)

The film industry is one of the most popular entertainment industries and one of the biggest markets for business. Among the contributing factors to this would be the success of a movie in terms of its popularity as well as its box office performance. Hence, we create a comprehensive comparison between the various machine learning models to predict the rate of success of a movie. The effectiveness of these models along with their statistical significance is studied to conclude which of these models is the best predictor. Some insights regarding factors that affect the success of the movies are also found. The models studied include some Regression models, Machine Learning models, a Time Series model and a Neural Network with the Neural Network being the best performing model with an accuracy of about 86%. Additionally, as part of the testing data for the movies released in 2020 are analysed.

[140]  arXiv:2112.00396 [pdf, other]
Title: Dyadic Human Motion Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Prior work on human motion forecasting has mostly focused on predicting the future motion of single subjects in isolation from their past pose sequence. In the presence of closely interacting people, however, this strategy fails to account for the dependencies between the different subject's motions. In this paper, we therefore introduce a motion prediction framework that explicitly reasons about the interactions of two observed subjects. Specifically, we achieve this by introducing a pairwise attention mechanism that models the mutual dependencies in the motion history of the two subjects. This allows us to preserve the long-term motion dynamics in a more realistic way and more robustly predict unusual and fast-paced movements, such as the ones occurring in a dance scenario. To evaluate this, and because no existing motion prediction datasets depict two closely-interacting subjects, we introduce the LindyHop600K dance dataset. Our results evidence that our approach outperforms the state-of-the-art single person motion prediction techniques.

[141]  arXiv:2112.00398 [pdf]
Title: Effective and efficient structure learning with pruning and model averaging strategies
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Learning the structure of a Bayesian Network (BN) with score-based solutions involves exploring the search space of possible graphs and moving towards the graph that maximises a given objective function. Some algorithms offer exact solutions that guarantee to return the graph with the highest objective score, while others offer approximate solutions in exchange for reduced computational complexity. This paper describes an approximate BN structure learning algorithm, which we call Model Averaging Hill-Climbing (MAHC), that combines two novel strategies with hill-climbing search. The algorithm starts by pruning the search space of graphs, where the pruning strategy can be viewed as an aggressive version of the pruning strategies that are typically applied to combinatorial optimisation structure learning problems. It then performs model averaging in the hill-climbing search process and moves to the neighbouring graph that maximises the objective function, on average, for that neighbouring graph and over all its valid neighbouring graphs. Comparisons with other algorithms spanning different classes of learning suggest that the combination of aggressive pruning with model averaging is both effective and efficient, particularly in the presence of data noise.

[142]  arXiv:2112.00399 [pdf]
Title: Quantum-Resistant Cryptography
Subjects: Cryptography and Security (cs.CR)

Quantum-resistant cryptography is cryptography that aims to deliver cryptographic functions and protocols that remain secure even if large-scale fault-tolerant quantum computers are built. NIST will soon announce the first selected public-key cryptography algorithms in its Post-Quantum Cryptography (PQC) standardization which is the most important current effort in the field of quantum-resistant cryptography. This report provides an overview to security experts who do not yet have a deep understanding of quantum-resistant cryptography. It surveys the computational model of quantum computers; the quantum algorithms that affect cryptography the most; the risk of Cryptographically Relevant Quantum Computers (CRQCs) being built; the security of symmetric and public-key cryptography in the presence of CRQCs; the NIST PQC standardization effort; the migration to quantum-resistant public-key cryptography; the relevance of Quantum Key Distribution as a complement to conventional cryptography; and the relevance of Quantum Random Number Generators as a complement to current hardware Random Number Generators.

[143]  arXiv:2112.00403 [pdf, other]
Title: Orientation of Fitch Graphs and Detection of Horizontal Gene Transfer in Gene Trees
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO); Populations and Evolution (q-bio.PE)

Horizontal gene transfer events partition a gene tree $T$ and thus, its leaf set into subsets of genes whose evolutionary history is described by speciation and duplication events alone. Indirect phylogenetic methods can be used to infer such partitions $\mathcal{P}$ from sequence similarity or evolutionary distances without any a priory knowledge about the underlying tree $T$. In this contribution, we assume that such a partition $\mathcal{P}$ of a set of genes $X$ is given and that, independently, an estimate $T$ of the original gene tree on $X$ has been derived. We then ask to what extent $T$ and the xenology information, i.e., $\mathcal{P}$ can be combined to determine the horizontal transfer edges in $T$. We show that for each pair of genes $x$ and $y$ with $x,y$ being in different parts of $\mathcal{P}$, it can be decided whether there always exists or never exists a horizontal gene transfer in $T$ along the path connecting $y$ and the most recent common ancestor of $x$ and $y$. This problem is equivalent to determining the presence or absence of the directed edge $(x,y)$ in so-called Fitch graphs; a more fine-grained version of graphs that represent the dependencies between the sets in $\mathcal{P}$. We then consider the generalization to insufficiently resolved gene trees and show that analogous results can be obtained. We show that the classification of $(x,y)$ can be computed in constant time after linear-time preprocessing. Using simulated gene family histories, we observe empirically that the vast majority of horizontal transfer edges in the gene tree $T$ can be recovered unambiguously.

[144]  arXiv:2112.00405 [pdf, other]
Title: NER-BERT: A Pre-trained Model for Low-Resource Entity Tagging
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Named entity recognition (NER) models generally perform poorly when large training datasets are unavailable for low-resource domains. Recently, pre-training a large-scale language model has become a promising direction for coping with the data scarcity issue. However, the underlying discrepancies between the language modeling and NER task could limit the models' performance, and pre-training for the NER task has rarely been studied since the collected NER datasets are generally small or large but with low quality. In this paper, we construct a massive NER corpus with a relatively high quality, and we pre-train a NER-BERT model based on the created dataset. Experimental results show that our pre-trained model can significantly outperform BERT as well as other strong baselines in low-resource scenarios across nine diverse domains. Moreover, a visualization of entity representations further indicates the effectiveness of NER-BERT for categorizing a variety of entities.

[145]  arXiv:2112.00407 [pdf, other]
Title: Compare Where It Matters: Using Layer-Wise Regularization To Improve Federated Learning on Heterogeneous Data
Comments: 8 pages, 5 figures, 4 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated Learning is a widely adopted method to train neural networks over distributed data. One main limitation is the performance degradation that occurs when data is heterogeneously distributed. While many works have attempted to address this problem, these methods under-perform because they are founded on a limited understanding of neural networks. In this work, we verify that only certain important layers in a neural network require regularization for effective training. We additionally verify that Centered Kernel Alignment (CKA) most accurately calculates similarity between layers of neural networks trained on different data. By applying CKA-based regularization to important layers during training, we significantly improve performance in heterogeneous settings. We present FedCKA: a simple framework that out-performs previous state-of-the-art methods on various deep learning tasks while also improving efficiency and scalability.

[146]  arXiv:2112.00408 [pdf, other]
Title: Approximating Length-Restricted Means under Dynamic Time Warping
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)

We study variants of the mean problem under the $p$-Dynamic Time Warping ($p$-DTW) distance, a popular and robust distance measure for sequential data. In our setting we are given a set of finite point sequences over an arbitrary metric space and we want to compute a mean point sequence of given length that minimizes the sum of $p$-DTW distances, each raised to the $q$th power, between the input sequences and the mean sequence. In general, the problem is $\mathrm{NP}$-hard and known not to be fixed-parameter tractable in the number of sequences. We show that it is even hard to approximate within any constant factor unless $\mathrm{P} = \mathrm{NP}$ and moreover if there exists a $\delta>0$ such that there is a $(\log n)^{\delta}$-approximation algorithm for DTW mean then $\mathrm{NP} \subseteq \mathrm{QP}$. On the positive side, we show that restricting the length of the mean sequence significantly reduces the hardness of the problem. We give an exact algorithm running in polynomial time for constant-length means. We explore various approximation algorithms that provide a trade-off between the approximation factor and the running time. Our approximation algorithms have a running time with only linear dependency on the number of input sequences. In addition, we use our mean algorithms to obtain clustering algorithms with theoretical guarantees.

[147]  arXiv:2112.00410 [pdf, other]
Title: Rethink, Revisit, Revise: A Spiral Reinforced Self-Revised Network for Zero-Shot Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Current approaches to Zero-Shot Learning (ZSL) struggle to learn generalizable semantic knowledge capable of capturing complex correlations. Inspired by \emph{Spiral Curriculum}, which enhances learning processes by revisiting knowledge, we propose a form of spiral learning which revisits visual representations based on a sequence of attribute groups (e.g., a combined group of \emph{color} and \emph{shape}). Spiral learning aims to learn generalized local correlations, enabling models to gradually enhance global learning and thus understand complex correlations. Our implementation is based on a 2-stage \emph{Reinforced Self-Revised (RSR)} framework: \emph{preview} and \emph{review}. RSR first previews visual information to construct diverse attribute groups in a weakly-supervised manner. Then, it spirally learns refined localities based on attribute groups and uses localities to revise global semantic correlations. Our framework outperforms state-of-the-art algorithms on four benchmark datasets in both zero-shot and generalized zero-shot settings, which demonstrates the effectiveness of spiral learning in learning generalizable and complex correlations. We also conduct extensive analysis to show that attribute groups and reinforced decision processes can capture complementary semantic information to improve predictions and aid explainability.

[148]  arXiv:2112.00412 [pdf, other]
Title: The Majority Can Help The Minority: Context-rich Minority Oversampling for Long-tailed Classification
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The problem of class imbalanced data lies in that the generalization performance of the classifier is deteriorated due to the lack of data of the minority classes. In this paper, we propose a novel minority over-sampling method to augment diversified minority samples by leveraging the rich context of the majority classes as background images. To diversify the minority samples, our key idea is to paste a foreground patch from a minority class to a background image from a majority class having affluent contexts. Our method is simple and can be easily combined with the existing long-tailed recognition methods. We empirically prove the effectiveness of the proposed oversampling method through extensive experiments and ablation studies. Without any architectural changes or complex algorithms, our method achieves state-of-the-art performance on various long-tailed classification benchmarks. Our code will be publicly available at link.

[149]  arXiv:2112.00420 [pdf, other]
Title: An adaptive mixture-population Monte Carlo method for likelihood-free inference
Comments: 23 pages, 7 figures
Subjects: Numerical Analysis (math.NA); Statistics Theory (math.ST)

This paper focuses on variational inference with intractable likelihood functions that can be unbiasedly estimated. A flexible variational approximation based on Gaussian mixtures is developed, by adopting the mixture population Monte Carlo (MPMC) algorithm in \cite{cappe2008adaptive}. MPMC updates iteratively the parameters of mixture distributions with importance sampling computations, instead of the complicated gradient estimation of the optimization objective in usual variational Bayes. Noticing that MPMC uses a fixed number of mixture components, which is difficult to predict for real applications, we further propose an automatic component--updating procedure to derive an appropriate number of components. The derived adaptive MPMC algorithm is capable of finding good approximations of the multi-modal posterior distributions even with a standard Gaussian as the initial distribution, as demonstrated in our numerical experiments.

[150]  arXiv:2112.00424 [pdf, other]
Title: Multi-Agent Transfer Learning in Reinforcement Learning-Based Ride-Sharing Systems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Reinforcement learning (RL) has been used in a range of simulated real-world tasks, e.g., sensor coordination, traffic light control, and on-demand mobility services. However, real world deployments are rare, as RL struggles with dynamic nature of real world environments, requiring time for learning a task and adapting to changes in the environment. Transfer Learning (TL) can help lower these adaptation times. In particular, there is a significant potential of applying TL in multi-agent RL systems, where multiple agents can share knowledge with each other, as well as with new agents that join the system. To obtain the most from inter-agent transfer, transfer roles (i.e., determining which agents act as sources and which as targets), as well as relevant transfer content parameters (e.g., transfer size) should be selected dynamically in each particular situation. As a first step towards fully dynamic transfers, in this paper we investigate the impact of TL transfer parameters with fixed source and target roles. Specifically, we label every agent-environment interaction with agent's epistemic confidence, and we filter the shared examples using varying threshold levels and sample sizes. We investigate impact of these parameters in two scenarios, a standard predator-prey RL benchmark and a simulation of a ride-sharing system with 200 vehicle agents and 10,000 ride-requests.

[151]  arXiv:2112.00425 [pdf, other]
Title: How to use Persistent Memory in your Database
Subjects: Databases (cs.DB)

Persistent or Non Volatile Memory (PMEM or NVM) has recently become commercially available under several configurations with different purposes and goals. Despite the attention to the topic, we are not aware of a comprehensive empirical analysis of existing relational database engines under different PMEM configurations. Such a study is important to understand the performance implications of the various hardware configurations and how different DB engines can benefit from them. To this end, we analyze three different engines (PostgreSQL, MySQL, and SQLServer) under common workloads (TPC-C and TPC-H) with all possible PMEM configurations supported by Intel's Optane NVM devices (PMEM as persistent memory in AppDirect mode and PMEM as volatile memory in Memory mode). Our results paint a complex picture and are not always intuitive due to the many factors involved. Based on our findings, we provide insights on how the different engines behave with PMEM and which configurations and queries perform best. Our results show that using PMEM as persistent storage usually speeds up query execution, but with some caveats as the I/O path is not fully optimized. Additionally, using PMEM in Memory mode does not offer any performance advantage despite the larger volatile memory capacity. Through the extensive coverage of engines and parameters, we provide an important starting point for exploiting PMEM in databases and tuning relational engines to take advantage of this new technology.

[152]  arXiv:2112.00427 [pdf]
Title: Research on Event Accumulator Settings for Event-Based SLAM
Comments: arXiv admin note: text overlap with arXiv:2008.05749 by other authors
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Event cameras are a new type of sensors that are different from traditional cameras. Each pixel is triggered asynchronously by event. The trigger event is the change of the brightness irradiated on the pixel. If the increment or decrement of brightness is higher than a certain threshold, an event is output. Compared with traditional cameras, event cameras have the advantages of high dynamic range and no motion blur. Accumulating events to frames and using traditional SLAM algorithm is a direct and efficient way for event-based SLAM. Different event accumulator settings, such as slice method of event stream, processing method for no motion, using polarity or not, decay function and event contribution, can cause quite different accumulating results. We conducted the research on how to accumulate event frames to achieve a better event-based SLAM performance. For experiment verification, accumulated event frames are fed to the traditional SLAM system to construct an event-based SLAM system. Our strategy of setting event accumulator has been evaluated on the public dataset. The experiment results show that our method can achieve better performance in most sequences compared with the state-of-the-art event frame based SLAM algorithm. In addition, the proposed approach has been tested on a quadrotor UAV to show the potential of applications in real scenario. Code and results are open sourced to benefit the research community of event cameras

[153]  arXiv:2112.00428 [pdf, other]
Title: Adv-4-Adv: Thwarting Changing Adversarial Perturbations via Adversarial Domain Adaptation
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Whereas adversarial training can be useful against specific adversarial perturbations, they have also proven ineffective in generalizing towards attacks deviating from those used for training. However, we observe that this ineffectiveness is intrinsically connected to domain adaptability, another crucial issue in deep learning for which adversarial domain adaptation appears to be a promising solution. Consequently, we proposed Adv-4-Adv as a novel adversarial training method that aims to retain robustness against unseen adversarial perturbations. Essentially, Adv-4-Adv treats attacks incurring different perturbations as distinct domains, and by leveraging the power of adversarial domain adaptation, it aims to remove the domain/attack-specific features. This forces a trained model to learn a robust domain-invariant representation, which in turn enhances its generalization ability. Extensive evaluations on Fashion-MNIST, SVHN, CIFAR-10, and CIFAR-100 demonstrate that a model trained by Adv-4-Adv based on samples crafted by simple attacks (e.g., FGSM) can be generalized to more advanced attacks (e.g., PGD), and the performance exceeds state-of-the-art proposals on these datasets.

[154]  arXiv:2112.00429 [pdf, ps, other]
Title: Security issues of CFS-like digital signature algorithms
Subjects: Cryptography and Security (cs.CR); Discrete Mathematics (cs.DM)

We analyse the security of some variants of the CFS code-based digital signature scheme. We show how the adoption of some code-based hash-functions to improve the efficiency of CFS leads to the ability of an attacker to produce a forgery compatible to the rightful user's public key.

[155]  arXiv:2112.00431 [pdf, other]
Title: MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Comments: 12 Pages, 6 Figures, 7 Tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In comparison, limited effort has been made at assessing the fitness of these datasets for the video-language grounding task. Recent works have begun to discover significant limitations in these datasets, suggesting that state-of-the-art techniques commonly overfit to hidden dataset biases. In this work, we present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations and focuses on crawling and aligning available audio descriptions of mainstream movies. MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of video and exhibits a significant reduction in the currently diagnosed biases for video-language grounding datasets. MAD's collection strategy enables a novel and more challenging version of video-language grounding, where short temporal moments (typically seconds long) must be accurately grounded in diverse long-form videos that can last up to three hours.

[156]  arXiv:2112.00432 [pdf, other]
Title: A benchmark with decomposed distribution shifts for 360 monocular depth estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In this work we contribute a distribution shift benchmark for a computer vision task; monocular depth estimation. Our differentiation is the decomposition of the wider distribution shift of uncontrolled testing on in-the-wild data, to three distinct distribution shifts. Specifically, we generate data via synthesis and analyze them to produce covariate (color input), prior (depth output) and concept (their relationship) distribution shifts. We also synthesize combinations and show how each one is indeed a different challenge to address, as stacking them produces increased performance drops and cannot be addressed horizontally using standard approaches.

[157]  arXiv:2112.00434 [pdf, other]
Title: Training Experimentally Robust and Interpretable Binarized Regression Models Using Mixed-Integer Programming
Subjects: Machine Learning (cs.LG)

In this paper, we explore model-based approach to training robust and interpretable binarized regression models for multiclass classification tasks using Mixed-Integer Programming (MIP). Our MIP model balances the optimization of prediction margin and model size by using a weighted objective that: minimizes the total margin of incorrectly classified training instances, maximizes the total margin of correctly classified training instances, and maximizes the overall model regularization. We conduct two sets of experiments to test the classification accuracy of our MIP model over standard and corrupted versions of multiple classification datasets, respectively. In the first set of experiments, we show that our MIP model outperforms an equivalent Pseudo-Boolean Optimization (PBO) model and achieves competitive results to Logistic Regression (LR) and Gradient Descent (GD) in terms of classification accuracy over the standard datasets. In the second set of experiments, we show that our MIP model outperforms the other models (i.e., GD and LR) in terms of classification accuracy over majority of the corrupted datasets. Finally, we visually demonstrate the interpretability of our MIP model in terms of its learned parameters over the MNIST dataset. Overall, we show the effectiveness of training robust and interpretable binarized regression models using MIP.

[158]  arXiv:2112.00443 [pdf, other]
Title: TROLLMAGNIFIER: Detecting State-Sponsored Troll Accounts on Reddit
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

Growing evidence points to recurring influence campaigns on social media, often sponsored by state actors aiming to manipulate public opinion on sensitive political topics. Typically, campaigns are performed through instrumented accounts, known as troll accounts; despite their prominence, however, little work has been done to detect these accounts in the wild. In this paper, we present TROLLMAGNIFIER, a detection system for troll accounts. Our key observation, based on analysis of known Russian-sponsored troll accounts identified by Reddit, is that they show loose coordination, often interacting with each other to further specific narratives. Therefore, troll accounts controlled by the same actor often show similarities that can be leveraged for detection. TROLLMAGNIFIER learns the typical behavior of known troll accounts and identifies more that behave similarly. We train TROLLMAGNIFIER on a set of 335 known troll accounts and run it on a large dataset of Reddit accounts. Our system identifies 1,248 potential troll accounts; we then provide a multi-faceted analysis to corroborate the correctness of our classification. In particular, 66% of the detected accounts show signs of being instrumented by malicious actors (e.g., they were created on the same exact day as a known troll, they have since been suspended by Reddit, etc.). They also discuss similar topics as the known troll accounts and exhibit temporal synchronization in their activity. Overall, we show that using TROLLMAGNIFIER, one can grow the initial knowledge of potential trolls provided by Reddit by over 300%.

[159]  arXiv:2112.00448 [pdf, other]
Title: On-Device Spatial Attention based Sequence Learning Approach for Scene Text Script Identification
Comments: Accepted for publication in CVIP 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automatic identification of script is an essential component of a multilingual OCR engine. In this paper, we present an efficient, lightweight, real-time and on-device spatial attention based CNN-LSTM network for scene text script identification, feasible for deployment on resource constrained mobile devices. Our network consists of a CNN, equipped with a spatial attention module which helps reduce the spatial distortions present in natural images. This allows the feature extractor to generate rich image representations while ignoring the deformities and thereby, enhancing the performance of this fine grained classification task. The network also employs residue convolutional blocks to build a deep network to focus on the discriminative features of a script. The CNN learns the text feature representation by identifying each character as belonging to a particular script and the long term spatial dependencies within the text are captured using the sequence learning capabilities of the LSTM layers. Combining the spatial attention mechanism with the residue convolutional blocks, we are able to enhance the performance of the baseline CNN to build an end-to-end trainable network for script identification. The experimental results on several standard benchmarks demonstrate the effectiveness of our method. The network achieves competitive accuracy with state-of-the-art methods and is superior in terms of network size, with a total of just 1.1 million parameters and inference time of 2.7 milliseconds.

[160]  arXiv:2112.00449 [pdf, ps, other]
Title: Frequent-Pattern Based Broadcast Scheduling for Conflict Avoidance in Multi-Channel Data Dissemination Systems
Comments: 10 figures, accepted by Wireless Communications and Mobile Computing, Special Issue on Innovative Artificial Intelligence-Based Internet of Things for Smart Cities and Smart Homes
Subjects: Networking and Internet Architecture (cs.NI)

With the popularity of mobile devices, using the traditional client-server model to handle a large number of requests is very challenging. Wireless data broadcasting can be used to provide services to many users at the same time, so reducing the average access time has become a popular research topic. For example, some location-based services (LBS) consider using multiple channels to disseminate information to reduce access time. However, data conflicts may occur when multiple channels are used, where multiple data items associated with the request are broadcast at about the same time. In this article, we consider the channel switching time and identify the data conflict issue in an on-demand multi-channel dissemination system. We model the considered problem as a Data Broadcast with Conflict Avoidance (DBCA) problem and prove it is NP-complete. We hence propose the frequent-pattern based broadcast scheduling (FPBS), which provides a new variant of the frequent pattern tree, FP*-tree, to schedule the requested data. Using FPBS, the system can avoid data conflicts when assigning data items to time slots in the channels. In the simulation, we discussed two modes of FPBS: online and offline. The results show that, compared with the existing heuristic methods, FPBS can shorten the average access time by 30%.

[161]  arXiv:2112.00451 [pdf, other]
Title: Unconditional well-posedness and IMEX improvement of a family of predictor-corrector methods in micromagnetics
Comments: 29 pages, 6 figures
Subjects: Numerical Analysis (math.NA)

Recently, Kim & Wilkening (Convergence of a mass-lumped finite element method for the Landau-Lifshitz equation, Quart. Appl. Math., 76, 383-405, 2018) proposed two novel predictor-corrector methods for the Landau-Lifshitz-Gilbert equation (LLG) in micromagnetics, which models the dynamics of the magnetization in ferromagnetic materials. Both integrators are based on the so-called Landau-Lifshitz form of LLG, use mass-lumped variational formulations discretized by first-order finite elements, and only require the solution of linear systems, despite the nonlinearity of LLG. The first(-order in time) method combines a linear update with an explicit projection of an intermediate approximation onto the unit sphere in order to fulfill the LLG-inherent unit-length constraint at the discrete level. In the second(-order in time) integrator, the projection step is replaced by a linear constraint-preserving variational formulation. In this paper, we extend the analysis of the integrators by proving unconditional well-posedness and by establishing a close connection of the methods with other approaches available in the literature. Moreover, the new analysis also provides a well-posed integrator for the Schr\"odinger map equation (which is the limit case of LLG for vanishing damping). Finally, we design an implicit-explicit strategy for the treatment of the lower-order field contributions, which significantly reduces the computational cost of the schemes, while preserving their theoretical properties.

[162]  arXiv:2112.00454 [pdf, other]
Title: Hyperbolae are the locus of constant angle difference
Comments: 3 pages, 2 figures
Subjects: Computational Geometry (cs.CG)

Given two points A,B in the plane, the locus of all points P for which the angles at A and B in the triangle A,B,P have a constant sum is a circular arc, by Thales' theorem. We show that the difference of these angles is kept a constant by points P on a hyperbola (albeit with foci different from A and B). Whereas hyperbolae are well-known to maintain a constant difference between the distances to their foci, the above angle property seems not to be widely known. The question was motivated by recent work by Alegr\'ia et al. and De Berg et al. on Voronoi diagrams of turning rays.

[163]  arXiv:2112.00457 [pdf, other]
Title: Broadband beam steering for misaligned multi-mode OAM communication systems
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Orbital angular momentum (OAM) at radio frequency (RF) has attracted more and more attention as a novel approach of multiplexing a set of orthogonal OAM modes on the same frequency channel to achieve high spectral efficiency (SE). However, the precondition for maintaining the orthogonality among different OAM modes is perfect alignment of the transmit and receive uniform circular arrays (UCAs), which is difficult to be satisfied in practical wireless communication scenario. Therefore, to achieve available multi-mode OAM broadband wireless communication, we first investigate the effect of oblique angles on the transmission performance of the multi-mode OAM broadband system in the non-parallel misalignment case. Then, we compare the UCA-based RF analog and baseband digital transceiver structures and corresponding beam steering schemes. Mathematical analysis and numerical simulations validate that the SE of the misaligned multi-mode OAM broadband system is quite low, while analog and digital beam steering both can significantly improve the SE of the system. However, digital beam steering can obtain higher SE than analog beam steering especially when the bandwidth and the number of array elements are large, which validates that baseband digital transceiver with digital beam steering is more suitable for multi-mode OAM broadband wireless communication systems in practice.

[164]  arXiv:2112.00459 [pdf, other]
Title: Information Theoretic Representation Distillation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite the empirical success of knowledge distillation, there still lacks a theoretical foundation that can naturally lead to computationally inexpensive implementations. To address this concern, we forge an alternative connection between information theory and knowledge distillation using a recently proposed entropy-like functional. In doing so, we introduce two distinct complementary losses which aim to maximise the correlation and mutual information between the student and teacher representations. Our method achieves competitive performance to state-of-the-art on the knowledge distillation and cross-model transfer tasks, while incurring significantly less training overheads than closely related and similarly performing approaches. We further demonstrate the effectiveness of our method on a binary distillation task, whereby we shed light to a new state-of-the-art for binary quantisation. The code, evaluation protocols, and trained models will be publicly available.

[165]  arXiv:2112.00463 [pdf, other]
Title: The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Domain adaptation is crucial to adapt a learned model to new scenarios, such as domain shifts or changing data distributions. Current approaches usually require a large amount of labeled or unlabeled data from the shifted domain. This can be a hurdle in fields which require continuous dynamic adaptation or suffer from scarcity of data, e.g. autonomous driving in challenging weather conditions. To address this problem of continuous adaptation to distribution shifts, we propose Dynamic Unsupervised Adaptation (DUA). We modify the feature representations of the model by continuously adapting the statistics of the batch normalization layers. We show that by accessing only a tiny fraction of unlabeled data from the shifted domain and adapting sequentially, a strong performance gain can be achieved. With even less than 1% of unlabeled data from the target domain, DUA already achieves competitive results to strong baselines. In addition, the computational overhead is minimal in contrast to previous approaches. Our approach is simple, yet effective and can be applied to any architecture which uses batch normalization as one of its components. We show the utility of DUA by evaluating it on a variety of domain adaptation datasets and tasks including object recognition, digit recognition and object detection.

[166]  arXiv:2112.00467 [pdf, other]
Title: A unified framework to improve the interoperability between HPC and Big Data languages and programming models
Comments: 18 pages, 22 Figures, 5 Tables, submitted to Future Generation Computer Systems journal
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

One of the most important issues in the path to the convergence of HPC and Big Data is caused by the differences in their software stacks. Despite some research efforts, the interoperability between their programming models and languages is still limited. To deal with this problem we introduce a new computing framework called IgnisHPC, whose main objective is to unify the execution of Big Data and HPC workloads in the same framework. IgnisHPC has native support for multi-language applications using JVM and non-JVM-based languages. Since MPI was used as its backbone technology, IgnisHPC takes advantage of many communication models and network architectures. Moreover, MPI applications can be directly executed in a efficient way in the framework. The main consequence is that users could combine in the same multi-language code HPC tasks (using MPI) with Big Data tasks (using MapReduce operations). The experimental evaluation demonstrates the benefits of our proposal in terms of performance and productivity with respect to other frameworks such as Apache Spark. IgnisHPC is publicly available for the Big Data and HPC research community.

[167]  arXiv:2112.00468 [pdf, ps, other]
Title: Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

The Facebook network allows its users to record their reactions to text via a typology of emotions. This network, taken at scale, is therefore a prime data set of annotated sentiment data. This paper uses millions of such reactions, derived from a decade worth of Facebook post data centred around a Sri Lankan context, to model an eye of the beholder approach to sentiment detection for online Sinhala textual content. Three different sentiment analysis models are built, taking into account a limited subset of reactions, all reactions, and another that derives a positive/negative star rating value. The efficacy of these models in capturing the reactions of the observers are then computed and discussed. The analysis reveals that binary classification of reactions, for Sinhala content, is significantly more accurate than the other approaches. Furthermore, the inclusion of the like reaction hinders the capability of accurately predicting other reactions.

[168]  arXiv:2112.00471 [pdf, other]
Title: Triangle Counting Accelerations: From Algorithm to In-Memory Computing Architecture
Comments: arXiv admin note: substantial text overlap with arXiv:2007.10702
Journal-ref: IEEE Transactions on Computers, 2021
Subjects: Hardware Architecture (cs.AR)

Triangles are the basic substructure of networks and triangle counting (TC) has been a fundamental graph computing problem in numerous fields such as social network analysis. Nevertheless, like other graph computing problems, due to the high memory-computation ratio and random memory access pattern, TC involves a large amount of data transfers thus suffers from the bandwidth bottleneck in the traditional Von-Neumann architecture. To overcome this challenge, in this paper, we propose to accelerate TC with the emerging processing-in-memory (PIM) architecture through an algorithm-architecture co-optimization manner. To enable the efficient in-memory implementations, we come up to reformulate TC with bitwise logic operations (such as AND), and develop customized graph compression and mapping techniques for efficient data flow management. With the emerging computational Spin-Transfer Torque Magnetic RAM (STT-MRAM) array, which is one of the most promising PIM enabling techniques, the device-to-architecture co-simulation results demonstrate that the proposed TC in-memory accelerator outperforms the state-of-the-art GPU and FPGA accelerations by 12.2x and 31.8x, respectively, and achieves a 34x energy efficiency improvement over the FPGA accelerator.

[169]  arXiv:2112.00475 [pdf, other]
Title: Weakly-Supervised Video Object Grounding via Causal Intervention
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

We target at the task of weakly-supervised video object grounding (WSVOG), where only video-sentence annotations are available during model learning. It aims to localize objects described in the sentence to visual regions in the video, which is a fundamental capability needed in pattern analysis and machine learning. Despite the recent progress, existing methods all suffer from the severe problem of spurious association, which will harm the grounding performance. In this paper, we start from the definition of WSVOG and pinpoint the spurious association from two aspects: (1) the association itself is not object-relevant but extremely ambiguous due to weak supervision, and (2) the association is unavoidably confounded by the observational bias when taking the statistics-based matching strategy in existing methods. With this in mind, we design a unified causal framework to learn the deconfounded object-relevant association for more accurate and robust video object grounding. Specifically, we learn the object-relevant association by causal intervention from the perspective of video data generation process. To overcome the problems of lacking fine-grained supervision in terms of intervention, we propose a novel spatial-temporal adversarial contrastive learning paradigm. To further remove the accompanying confounding effect within the object-relevant association, we pursue the true causality by conducting causal intervention via backdoor adjustment. Finally, the deconfounded object-relevant association is learned and optimized under a unified causal framework in an end-to-end manner. Extensive experiments on both IID and OOD testing sets of three benchmarks demonstrate its accurate and robust grounding performance against state-of-the-arts.

[170]  arXiv:2112.00476 [pdf, other]
Title: Data Augmentation Based on Null Model for Graph Classification
Subjects: Social and Information Networks (cs.SI)

In network science, null model is typically used to generate a series of graphs based on randomization under certain condition, which is widely used as a term of comparison to verify whether the networks in question display some non-trivial features, such as community structure. Since such non-trivial features may play a significant role in graph classification, the null model could provide a new perspective for regularization, so as to lead to the enhancement of classification performance. In this paper, we propose a novel data augmentation framework based on null model for graph classification, which contains four parts: feature ranking, graph data augmentation, data filtration, and model retraining. Moreover, in this framework, three heuristic null model generation methods are proposed for different features. Experiments are conducted on five famous benchmark datasets, and the results show that our framework has promising performance, providing a new direction of data augmentation for graph classification.

[171]  arXiv:2112.00478 [pdf, other]
Title: On the Practical Consistency of Meta-Reinforcement Learning Algorithms
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Consistency is the theoretical property of a meta learning algorithm that ensures that, under certain assumptions, it can adapt to any task at test time. An open question is whether and how theoretical consistency translates into practice, in comparison to inconsistent algorithms. In this paper, we empirically investigate this question on a set of representative meta-RL algorithms. We find that theoretically consistent algorithms can indeed usually adapt to out-of-distribution (OOD) tasks, while inconsistent ones cannot, although they can still fail in practice for reasons like poor exploration. We further find that theoretically inconsistent algorithms can be made consistent by continuing to update all agent components on the OOD tasks, and adapt as well or better than originally consistent ones. We conclude that theoretical consistency is indeed a desirable property, and inconsistent meta-RL algorithms can easily be made consistent to enjoy the same benefits.

[172]  arXiv:2112.00484 [pdf, other]
Title: Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Although considerable progress has been made in semantic scene understanding under clear weather, it is still a tough problem under adverse weather conditions, such as dense fog, due to the uncertainty caused by imperfect observations. Besides, difficulties in collecting and labeling foggy images hinder the progress of this field. Considering the success in semantic scene understanding under clear weather, we think it is reasonable to transfer knowledge learned from clear images to the foggy domain. As such, the problem becomes to bridge the domain gap between clear images and foggy images. Unlike previous methods that mainly focus on closing the domain gap caused by fog -- defogging the foggy images or fogging the clear images, we propose to alleviate the domain gap by considering fog influence and style variation simultaneously. The motivation is based on our finding that the style-related gap and the fog-related gap can be divided and closed respectively, by adding an intermediate domain. Thus, we propose a new pipeline to cumulatively adapt style, fog and the dual-factor (style and fog). Specifically, we devise a unified framework to disentangle the style factor and the fog factor separately, and then the dual-factor from images in different domains. Furthermore, we collaborate the disentanglement of three factors with a novel cumulative loss to thoroughly disentangle these three factors. Our method achieves the state-of-the-art performance on three benchmarks and shows generalization ability in rainy and snowy scenes.

[173]  arXiv:2112.00485 [pdf, other]
Title: Learning Transformer Features for Image Quality Assessment
Authors: Chao Zeng, Sam Kwong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Objective image quality evaluation is a challenging task, which aims to measure the quality of a given image automatically. According to the availability of the reference images, there are Full-Reference and No-Reference IQA tasks, respectively. Most deep learning approaches use regression from deep features extracted by Convolutional Neural Networks. For the FR task, another option is conducting a statistical comparison on deep features. For all these methods, non-local information is usually neglected. In addition, the relationship between FR and NR tasks is less explored. Motivated by the recent success of transformers in modeling contextual information, we propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features. The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme. Evaluation experiments on three standard IQA datasets, i.e., LIVE, CSIQ and TID2013, and KONIQ-10K, show that our proposed model can achieve state-of-the-art FR performance. In addition, comparable NR performance is achieved in extensive experiments, and the results show that the NR performance can be leveraged by the joint training scheme.

[174]  arXiv:2112.00491 [pdf, other]
Title: An Age of Information Characterization of Frameless ALOHA
Subjects: Information Theory (cs.IT)

We provide a characterization of the peak age of information (AoI) achievable in a random-access system operating according to the frameless ALOHA protocol. Differently from previous studies, our analysis accounts for the fact that the number of terminals contending the channel may vary over time, as a function of the duration of the previous contention period. The exact characterization of the AoI provided in this paper, which is based on a Markovian analysis, reveals the impact of some key protocol parameters such as the maximum length of the contention period, on the average peak AoI. Specifically, we show that setting this parameter so as to maximize the throughput may result in an AoI degradation.

[175]  arXiv:2112.00492 [pdf, other]
Title: Human-Object Interaction Detection via Weak Supervision
Comments: Accepted at BMVC'21
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The goal of this paper is Human-object Interaction (HO-I) detection. HO-I detection aims to find interacting human-objects regions and classify their interaction from an image. Researchers obtain significant improvement in recent years by relying on strong HO-I alignment supervision from [5]. HO-I alignment supervision pairs humans with their interacted objects, and then aligns human-object pair(s) with their interaction categories. Since collecting such annotation is expensive, in this paper, we propose to detect HO-I without alignment supervision. We instead rely on image-level supervision that only enumerates existing interactions within the image without pointing where they happen. Our paper makes three contributions: i) We propose Align-Former, a visual-transformer based CNN that can detect HO-I with only image-level supervision. ii) Align-Former is equipped with HO-I align layer, that can learn to select appropriate targets to allow detector supervision. iii) We evaluate Align-Former on HICO-DET [5] and V-COCO [13], and show that Align-Former outperforms existing image-level supervised HO-I detectors by a large margin (4.71% mAP improvement from 16.14% to 20.85% on HICO-DET [5]).

[176]  arXiv:2112.00494 [pdf, ps, other]
Title: Closeness Centrality via the Condorcet Principle
Authors: Oskar Skibski
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)

We uncover a new relation between Closeness centrality and the Condorcet principle. We define a Condorcet winner in a graph as a node that compared to any other node is closer to more nodes. In other words, if we assume that nodes vote on a closer candidate, a Condorcet winner would win a two-candidate election against any other node in a plurality vote. We show that Closeness centrality and its random-walk version, Random-Walk Closeness centrality, are the only classic centrality measures that are Condorcet consistent on trees, i.e., if a Condorcet winner exists, they rank it first. While they are not Condorcet consistent in general graphs, we show that Closeness centrality satisfies the Condorcet Comparison property that states that out of two adjacent nodes, the one preferred by more nodes has higher centrality. We show that Closeness centrality is the only regular distance-based centrality with such a property.

[177]  arXiv:2112.00496 [pdf, other]
Title: Revisiting the Transferability of Supervised Pretraining: an MLP Perspective
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The pretrain-finetune paradigm is a classical pipeline in visual learning. Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light on understanding the transferability gap between unsupervised and supervised pretraining from a multilayer perceptron (MLP) perspective. While previous works focus on the effectiveness of MLP on unsupervised image classification where pretraining and evaluation are conducted on the same dataset, we reveal that the MLP projector is also the key factor to better transferability of unsupervised pretraining methods than supervised pretraining methods. Based on this observation, we attempt to close the transferability gap between supervised and unsupervised pretraining by adding an MLP projector before the classifier in supervised pretraining. Our analysis indicates that the MLP projector can help retain intra-class variation of visual features, decrease the feature distribution distance between pretraining and evaluation datasets, and reduce feature redundancy. Extensive experiments on public benchmarks demonstrate that the added MLP projector significantly boosts the transferability of supervised pretraining, \eg \textbf{+7.2\%} top-1 accuracy on the concept generalization task, \textbf{+5.8\%} top-1 accuracy for linear evaluation on 12-domain classification tasks, and \textbf{+0.8\%} AP on COCO object detection task, making supervised pretraining comparable or even better than unsupervised pretraining. Codes will be released upon acceptance.

[178]  arXiv:2112.00499 [pdf, other]
Title: Structure-Aware Label Smoothing for Graph Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

Representing a label distribution as a one-hot vector is a common practice in training node classification models. However, the one-hot representation may not adequately reflect the semantic characteristics of a node in different classes, as some nodes may be semantically close to their neighbors in other classes. It would cause over-confidence since the models are encouraged to assign full probabilities when classifying every node. While training models with label smoothing can ease this problem to some degree, it still fails to capture the nodes' semantic characteristics implied by the graph structures. In this work, we propose a novel SALS (\textit{Structure-Aware Label Smoothing}) method as an enhancement component to popular node classification models. SALS leverages the graph structures to capture the semantic correlations between the connected nodes and generate the structure-aware label distribution to replace the original one-hot label vectors, thus improving the node classification performance without inference costs. Extensive experiments on seven node classification benchmark datasets reveal the effectiveness of our SALS on improving both transductive and inductive node classification. Empirical results show that SALS is superior to the label smoothing method and enhances the node classification models to outperform the baseline methods.

[179]  arXiv:2112.00503 [pdf, other]
Title: Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph
Comments: Accepted to AAAI 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to the rudimentary intra-sentence relations, to further utilize the syntactic dependencies in the multi-sentence input of the MRC task. In our approach, we build the Inter-Sentence Dependency Graph (ISDG) connecting dependency trees to form global syntactic relations across sentences. We then propose the ISDG encoder that encodes the global dependency graph, addressing the inter-sentence relations via both one-hop and multi-hop dependency paths explicitly. Experiments on three multilingual MRC datasets (XQuAD, MLQA, TyDiQA-GoldP) show that our encoder that is only trained on English is able to improve the zero-shot performance on all 14 test sets covering 8 languages, with up to 3.8 F1 / 5.2 EM improvement on-average, and 5.2 F1 / 11.2 EM on certain languages. Further analysis shows the improvement can be attributed to the attention on the cross-linguistically consistent syntactic path.

[180]  arXiv:2112.00504 [pdf, other]
Title: Learning Oriented Remote Sensing Object Detection via Naive Geometric Computing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Detecting oriented objects along with estimating their rotation information is one crucial step for analyzing remote sensing images. Despite that many methods proposed recently have achieved remarkable performance, most of them directly learn to predict object directions under the supervision of only one (e.g. the rotation angle) or a few (e.g. several coordinates) groundtruth values individually. Oriented object detection would be more accurate and robust if extra constraints, with respect to proposal and rotation information regression, are adopted for joint supervision during training. To this end, we innovatively propose a mechanism that simultaneously learns the regression of horizontal proposals, oriented proposals, and rotation angles of objects in a consistent manner, via naive geometric computing, as one additional steady constraint (see Figure 1). An oriented center prior guided label assignment strategy is proposed for further enhancing the quality of proposals, yielding better performance. Extensive experiments demonstrate the model equipped with our idea significantly outperforms the baseline by a large margin to achieve a new state-of-the-art result without any extra computational burden during inference. Our proposed idea is simple and intuitive that can be readily implemented. Source codes and trained models are involved in supplementary files.

[181]  arXiv:2112.00508 [pdf, other]
Title: A symmetrized parametric finite element method for anisotropic surface diffusion of closed curves via a Cahn-Hoffman $\boldsymbolξ$-vector formulation
Comments: 25 pages, 8 figures. arXiv admin note: text overlap with arXiv:2012.05610
Subjects: Numerical Analysis (math.NA)

We deal with a long-standing problem about how to design an energy-stable numerical scheme for solving the motion of a closed curve under {\sl anisotropic surface diffusion} with a general anisotropic surface energy $\gamma(\boldsymbol{n})$ in two dimensions, where $\boldsymbol{n}$ is the outward unit normal vector. By introducing a novel symmetric positive definite surface energy matrix $Z_k(\boldsymbol{n})$ depending on the Cahn-Hoffman $\boldsymbol{\xi}$-vector and a stabilizing function $k(\boldsymbol{n})$, we first reformulate the anisotropic surface diffusion into a conservative form and then derive a new symmetrized variational formulation for the anisotropic surface diffusion with both weakly and strongly anisotropic surface energies. A semi-discretization in space for the symmetrized variational formulation is proposed and its area (or mass) conservation and energy dissipation are proved. The semi-discretization is then discretized in time by either an implicit structural-preserving scheme (SP-PFEM) which preserves the area in the discretized level or a semi-implicit energy-stable method (ES-PFEM) which needs only solve a linear system at each time step. Under a relatively simple and mild condition on $\gamma(\boldsymbol{n})$, we show that both SP-PFEM and ES-PFEM are energy dissipative and thus are unconditionally energy-stable for almost all anisotropic surface energies $\gamma(\boldsymbol{n})$ arising in practical applications. Specifically, for several commonly-used anisotropic surface energies, we construct $Z_k(\boldsymbol{n})$ explicitly. Finally, extensive numerical results are reported to demonstrate the efficiency and accuracy as well as the unconditional energy-stability of the proposed symmetrized parametric finite element method.

[182]  arXiv:2112.00510 [pdf, other]
Title: Trimap-guided Feature Mining and Fusion Network for Natural Image Matting
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Utilizing trimap guidance and fusing multi-level features are two important issues for trimap-based matting with pixel-level prediction. To utilize trimap guidance, most existing approaches simply concatenate trimaps and images together to feed a deep network or apply an extra network to extract more trimap guidance, which meets the conflict between efficiency and effectiveness. For emerging content-based feature fusion, most existing matting methods only focus on local features which lack the guidance of a global feature with strong semantic information related to the interesting object. In this paper, we propose a trimap-guided feature mining and fusion network consisting of our trimap-guided non-background multi-scale pooling (TMP) module and global-local context-aware fusion (GLF) modules. Considering that trimap provides strong semantic guidance, our TMP module focuses effective feature mining on interesting objects under the guidance of trimap without extra parameters. Furthermore, our GLF modules use global semantic information of interesting objects mined by our TMP module to guide an effective global-local context-aware multi-level feature fusion. In addition, we build a common interesting object matting (CIOM) dataset to advance high-quality image matting. Experimental results on the Composition-1k test set, Alphamatting benchmark, and our CIOM test set demonstrate that our method outperforms state-of-the-art approaches. Code and models will be publicly available soon.

[183]  arXiv:2112.00515 [pdf, other]
Title: TXOP sharing with Coordinated Spatial Reuse in Multi-AP Cooperative IEEE 802.11be WLANs
Subjects: Networking and Internet Architecture (cs.NI)

IEEE 802.11be networks (aka Wi-Fi 7) will have to cope with new bandwidth-hungry and low-latency services such as eXtended Reality and multi-party cloud gaming. With this goal in mind, transmit opportunity (TXOP) sharing between coordinated access points (APs) may contribute to alleviating inter-AP contention, hence increasing the overall network throughput. This paper evaluates two coordinated TXOP sharing strategies: coordinated time division multiple access (c-TDMA) and coordinated-TDMA with spatial reuse (c-TDMA/SR). We show that, while c-TDMA alone does not result in any significant improvement in terms of the WLAN throughput, it lays the groundwork to implement coordinated SR (c-SR) techniques. To evaluate the performance of c-TDMA/SR, we propose a fair scheduler able to select the best subset of parallel transmissions in WLAN deployments, as well as the appropriate power levels to be used by APs and stations (STAs), leading to maximum performance. The results obtained for c-TDMA/SR show significant throughput gains compared with c-TDMA, with values higher than 140% in 90% of the considered scenarios.

[184]  arXiv:2112.00516 [pdf, other]
Title: Simultaneous Controller and Lyapunov Function Design for Constrained Nonlinear Systems
Comments: Initial submission to ACC 2022
Subjects: Systems and Control (eess.SY)

This paper presents a method to stabilize state and input constrained nonlinear systems using an offline optimization on variable triangulations of the set of admissible states. For control-affine systems, by choosing a continuous piecewise affine (CPA) controller structure, the non-convex optimization is formulated as iterative semi-definite programming (SDP), which can be solved efficiently using available software. The method has very general assumptions on the system's dynamics and constraints. Unlike similar existing methods, it avoids finding terminal invariant sets, solving non-convex optimizations, and does not rely on knowing a control Lyapunov function (CLF), as it finds a CPA Lyapunov function explicitly. The method enforces a desired upper-bound on the decay rate of the state norm and finds the exact region of attraction. Thus, it can be also viewed as a systematic approach for finding Lipschitz CLFs in state and input constrained control-affine systems. Using the CLF, a minimum norm controller is also formulated by quadratic programming for online application.

[185]  arXiv:2112.00519 [pdf, ps, other]
Title: On the Complexity of the Geometric Median Problem with Outliers
Subjects: Computational Geometry (cs.CG); Computational Complexity (cs.CC); Optimization and Control (math.OC)

In the Geometric Median problem with outliers, we are given a finite set of points in d-dimensional real space and an integer m, the goal is to locate a new point in space (center) and choose m of the input points to minimize the sum of the Euclidean distances from the center to the chosen points. This problem can be solved "almost exactly" in polynomial time if d is fixed and admits an approximation scheme PTAS in high dimensions. However, the complexity of the problem was an open question. We prove that, if the dimension of space is not fixed, Geometric Median with outliers is strongly NP-hard, does not admit approximation schemes FPTAS unless P=NP, and is W[1]-hard with respect to the parameter m. The proof is done by a reduction from the Independent Set problem. Based on a similar reduction, we also get the NP-hardness of closely related geometric 2-clustering problems in which it is required to partition a given set of points into two balanced clusters minimizing the cost of median clustering. Finally, we study Geometric Median with outliers in $\ell_\infty$ space and prove the same complexity results as for the Euclidean problem.

[186]  arXiv:2112.00527 [pdf, other]
Title: Subtask-dominated Transfer Learning for Long-tail Person Search
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Person search unifies person detection and person re-identification (Re-ID) to locate query persons from the panoramic gallery images. One major challenge comes from the imbalanced long-tail person identity distributions, which prevents the one-step person search model from learning discriminative person features for the final re-identification. However, it is under-explored how to solve the heavy imbalanced identity distributions for the one-step person search. Techniques designed for the long-tail classification task, for example, image-level re-sampling strategies, are hard to be effectively applied to the one-step person search which jointly solves person detection and Re-ID subtasks with a detection-based multi-task framework. To tackle this problem, we propose a Subtask-dominated Transfer Learning (STL) method. The STL method solves the long-tail problem in the pretraining stage of the dominated Re-ID subtask and improves the one-step person search by transfer learning of the pretrained model. We further design a Multi-level RoI Fusion Pooling layer to enhance the discrimination ability of person features for the one-step person search. Extensive experiments on CUHK-SYSU and PRW datasets demonstrate the superiority and effectiveness of the proposed method.

[187]  arXiv:2112.00529 [pdf, other]
Title: Improving gearshift controllers for electric vehicles with reinforcement learning
Subjects: Systems and Control (eess.SY)

During a multi-speed transmission development process, the final calibration of the gearshift controller parameters is usually performed on a physical test bench. Engineers typically treat the mapping from the controller parameters to the gearshift quality as a black-box, and use methods rooted in experimental design -- a purely statistical approach -- to infer the parameter combination that will maximize a chosen gearshift performance indicator. This approach unfortunately requires thousands of gearshift trials, ultimately discouraging the exploration of different control strategies. In this work, we calibrate the feedforward and feedback parameters of a gearshift controller using a model-based reinforcement learning algorithm adapted from Pilco. Experimental results show that the method optimizes the controller parameters with few gearshift trials. This approach can accelerate the exploration of gearshift control strategies, which is especially important for the emerging technology of multi-speed transmissions for electric vehicles.

[188]  arXiv:2112.00532 [pdf, other]
Title: FaceTuneGAN: Face Autoencoder for Convolutional Expression Transfer Using Neural Generative Adversarial Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

In this paper, we present FaceTuneGAN, a new 3D face model representation decomposing and encoding separately facial identity and facial expression. We propose a first adaptation of image-to-image translation networks, that have successfully been used in the 2D domain, to 3D face geometry. Leveraging recently released large face scan databases, a neural network has been trained to decouple factors of variations with a better knowledge of the face, enabling facial expressions transfer and neutralization of expressive faces. Specifically, we design an adversarial architecture adapting the base architecture of FUNIT and using SpiralNet++ for our convolutional and sampling operations. Using two publicly available datasets (FaceScape and CoMA), FaceTuneGAN has a better identity decomposition and face neutralization than state-of-the-art techniques. It also outperforms classical deformation transfer approach by predicting blendshapes closer to ground-truth data and with less of undesired artifacts due to too different facial morphologies between source and target.

[189]  arXiv:2112.00534 [pdf, other]
Title: Empirical evaluation of shallow and deep learning classifiers for Arabic sentiment analysis
Journal-ref: ACM Trans. Asian Low-Resour. Lang. Inf. Process. 21, 1, Article 14 (November 2021), 25 pages (2021)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

This work presents a detailed comparison of the performance of deep learning models such as convolutional neural networks (CNN), long short-term memory (LSTM), gated recurrent units (GRU), their hybrids, and a selection of shallow learning classifiers for sentiment analysis of Arabic reviews. Additionally, the comparison includes state-of-the-art models such as the transformer architecture and the araBERT pre-trained model. The datasets used in this study are multi-dialect Arabic hotel and book review datasets, which are some of the largest publicly available datasets for Arabic reviews. Results showed deep learning outperforming shallow learning for binary and multi-label classification, in contrast with the results of similar work reported in the literature. This discrepancy in outcome was caused by dataset size as we found it to be proportional to the performance of deep learning models. The performance of deep and shallow learning techniques was analyzed in terms of accuracy and F1 score. The best performing shallow learning technique was Random Forest followed by Decision Tree, and AdaBoost. The deep learning models performed similarly using a default embedding layer, while the transformer model performed best when augmented with araBERT.

[190]  arXiv:2112.00538 [pdf]
Title: 'Entanglement' -- A new dynamic metric to measure team flow
Journal-ref: Social Networks 70, 100-111 (2022)
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

We introduce "entanglement", a novel metric to measure how synchronized communication between team members is. This measure calculates the Euclidean distance among team members' social network metrics timeseries. We validate the metric with four case studies. The first case study uses entanglement of 11 medical innovation teams to predict team performance and learning behavior. The second case looks at the e-mail communication of 113 senior executives of an international services firm, predicting employee turnover through lack of entanglement of an employee. The third case analyzes the individual employee performance of 81 managers. The fourth case study predicts performance of 13 customer-dedicated teams at a big international company by comparing entanglement in the e-mail interactions with satisfaction of their customers measured through Net Promoter Score (NPS). While we can only speculate about what is causing the entanglement effect, we find that it is a new and versatile indicator for the analysis of employees' communication, analyzing the hitherto underused temporal dimension of online social networks which could be used as a powerful predictor of employee and team performance, employee turnover, and customer satisfaction.

[191]  arXiv:2112.00544 [pdf, other]
Title: Molecular Contrastive Learning with Chemical Element Knowledge Graph
Comments: Accepted in AAAI 2022 Main track
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

Molecular representation learning contributes to multiple downstream tasks such as molecular property prediction and drug design. To properly represent molecules, graph contrastive learning is a promising paradigm as it utilizes self-supervision signals and has no requirements for human annotations. However, prior works fail to incorporate fundamental domain knowledge into graph semantics and thus ignore the correlations between atoms that have common attributes but are not directly connected by bonds. To address these issues, we construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements and propose a novel Knowledge-enhanced Contrastive Learning (KCL) framework for molecular representation learning. KCL framework consists of three modules. The first module, knowledge-guided graph augmentation, augments the original molecular graph based on the Chemical Element KG. The second module, knowledge-aware graph representation, extracts molecular representations with a common graph encoder for the original molecular graph and a Knowledge-aware Message Passing Neural Network (KMPNN) to encode complex information in the augmented molecular graph. The final module is a contrastive objective, where we maximize agreement between these two views of molecular graphs. Extensive experiments demonstrated that KCL obtained superior performances against state-of-the-art baselines on eight molecular datasets. Visualization experiments properly interpret what KCL has learned from atoms and attributes in the augmented molecular graphs. Our codes and data are available in supplementary materials.

[192]  arXiv:2112.00552 [pdf, other]
Title: SaDe: Learning Models that Provably Satisfy Domain Constraints
Comments: 10 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

With increasing real world applications of machine learning, models are often required to comply with certain domain based requirements, e.g., safety guarantees in aircraft systems, legal constraints in a loan approval model. A natural way to represent these properties is in the form of constraints. Including such constraints in machine learning is typically done by the means of regularization, which does not guarantee satisfaction of the constraints. In this paper, we present a machine learning approach that can handle a wide variety of constraints, and guarantee that these constraints will be satisfied by the model even on unseen data. We cast machine learning as a maximum satisfiability problem, and solve it using a novel algorithm SaDe which combines constraint satisfaction with gradient descent. We demonstrate on three use cases that this approach learns models that provably satisfy the given constraints.

[193]  arXiv:2112.00554 [pdf, other]
Title: Quoting is not Citing: Disentangling Affiliation and Interaction on Twitter
Comments: Proc. Complex Networks'21, 10th International Conference on Complex Networks and their Applications
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Adaptation and Self-Organizing Systems (nlin.AO)

Interaction networks are generally much less homophilic than affiliation networks, accommodating for many more cross-cutting links. By statistically assigning a political valence to users from their network-level affiliation patterns, and by further contrasting interaction and affiliation (quotes and retweets) within specific discursive events, namely quote trees, we describe a variety of cross-cutting patterns which significantly nuance the traditional "echo chamber" narrative.

[194]  arXiv:2112.00556 [pdf, other]
Title: Semi-Supervised Surface Anomaly Detection of Composite Wind Turbine Blades From Drone Imagery
Comments: In-proceedings at 2022 17th International Conference on Computer Vision Theory and Applications (VISAPP)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Within commercial wind energy generation, the monitoring and predictive maintenance of wind turbine blades in-situ is a crucial task, for which remote monitoring via aerial survey from an Unmanned Aerial Vehicle (UAV) is commonplace. Turbine blades are susceptible to both operational and weather-based damage over time, reducing the energy efficiency output of turbines. In this study, we address automating the otherwise time-consuming task of both blade detection and extraction, together with fault detection within UAV-captured turbine blade inspection imagery. We propose BladeNet, an application-based, robust dual architecture to perform both unsupervised turbine blade detection and extraction, followed by super-pixel generation using the Simple Linear Iterative Clustering (SLIC) method to produce regional clusters. These clusters are then processed by a suite of semi-supervised detection methods. Our dual architecture detects surface faults of glass fibre composite material blades with high aptitude while requiring minimal prior manual image annotation. BladeNet produces an Average Precision (AP) of 0.995 across our {\O}rsted blade inspection dataset for offshore wind turbines and 0.223 across the Danish Technical University (DTU) NordTank turbine blade inspection dataset. BladeNet also obtains an AUC of 0.639 for surface anomaly detection across the {\O}rsted blade inspection dataset.

[195]  arXiv:2112.00557 [pdf]
Title: 3D Reconstruction Using a Linear Laser Scanner and a Camera
Authors: Rui Wang
Comments: 8 pages, 16 figures, published in The 2nd International Conference on Artificial Intelligence and Computer Engineering (ICAICE2021)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

With the rapid development of computer graphics and vision, several three-dimensional (3D) reconstruction techniques have been proposed and used to obtain the 3D representation of objects in the form of point cloud models, mesh models, and geometric models. The cost of 3D reconstruction is declining due to the maturing of this technology, however, the inexpensive 3D reconstruction scanners on the market may not be able to generate a clear point cloud model as expected. This study systematically reviews some basic types of 3D reconstruction technology and introduces an easy implementation using a linear laser scanner, a camera, and a turntable. The implementation is based on the monovision with laser and has tested several objects like wiki and mug. The accuracy and resolution of the point cloud result are quite satisfying. It turns out everyone can build such a 3D reconstruction system with appropriate procedures.

[196]  arXiv:2112.00560 [pdf, other]
Title: Attribute Artifacts Removal for Geometry-based Point Cloud Compression
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Geometry-based point cloud compression (G-PCC) can achieve remarkable compression efficiency for point clouds. However, it still leads to serious attribute compression artifacts, especially under low bitrate scenarios. In this paper, we propose a Multi-Scale Graph Attention Network (MS-GAT) to remove the artifacts of point cloud attributes compressed by G-PCC. We first construct a graph based on point cloud geometry coordinates and then use the Chebyshev graph convolutions to extract features of point cloud attributes. Considering that one point may be correlated with points both near and far away from it, we propose a multi-scale scheme to capture the short and long range correlations between the current point and its neighboring and distant points. To address the problem that various points may have different degrees of artifacts caused by adaptive quantization, we introduce the quantization step per point as an extra input to the proposed network. We also incorporate a graph attentional layer into the network to pay special attention to the points with more attribute artifacts. To the best of our knowledge, this is the first attribute artifacts removal method for G-PCC. We validate the effectiveness of our method over various point clouds. Experimental results show that our proposed method achieves an average of 9.28% BD-rate reduction. In addition, our approach achieves some performance improvements for the downstream point cloud semantic segmentation task.

[197]  arXiv:2112.00566 [pdf, ps, other]
Title: NLP Research and Resources at DaSciM, Ecole Polytechnique
Subjects: Computation and Language (cs.CL)

DaSciM (Data Science and Mining) part of LIX at Ecole Polytechnique, established in 2013 and since then producing research results in the area of large scale data analysis via methods of machine and deep learning. The group has been specifically active in the area of NLP and text mining with interesting results at methodological and resources level. Here follow our different contributions of interest to the AFIA community.

[198]  arXiv:2112.00567 [pdf, other]
Title: DPRK-BERT: The Supreme Language Model
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Deep language models have achieved remarkable success in the NLP domain. The standard way to train a deep language model is to employ unsupervised learning from scratch on a large unlabeled corpus. However, such large corpora are only available for widely-adopted and high-resource languages and domains. This study presents the first deep language model, DPRK-BERT, for the DPRK language. We achieve this by compiling the first unlabeled corpus for the DPRK language and fine-tuning a preexisting the ROK language model. We compare the proposed model with existing approaches and show significant improvements on two DPRK datasets. We also present a cross-lingual version of this model which yields better generalization across the two Korean languages. Finally, we provide various NLP tools related to the DPRK language that would foster future research.

[199]  arXiv:2112.00568 [pdf, other]
Title: Dual Spoof Disentanglement Generation for Face Anti-spoofing with Depth Uncertainty Learning
Comments: Accepted to TCSVT, arXiv version. The codes are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face anti-spoofing (FAS) plays a vital role in preventing face recognition systems from presentation attacks. Existing face anti-spoofing datasets lack diversity due to the insufficient identity and insignificant variance, which limits the generalization ability of FAS model. In this paper, we propose Dual Spoof Disentanglement Generation (DSDG) framework to tackle this challenge by "anti-spoofing via generation". Depending on the interpretable factorized latent disentanglement in Variational Autoencoder (VAE), DSDG learns a joint distribution of the identity representation and the spoofing pattern representation in the latent space. Then, large-scale paired live and spoofing images can be generated from random noise to boost the diversity of the training set. However, some generated face images are partially distorted due to the inherent defect of VAE. Such noisy samples are hard to predict precise depth values, thus may obstruct the widely-used depth supervised optimization. To tackle this issue, we further introduce a lightweight Depth Uncertainty Module (DUM), which alleviates the adverse effects of noisy samples by depth uncertainty learning. DUM is developed without extra-dependency, thus can be flexibly integrated with any depth supervised network for face anti-spoofing. We evaluate the effectiveness of the proposed method on five popular benchmarks and achieve state-of-the-art results under both intra- and inter- test settings. The codes are available at https://github.com/JDAI-CV/FaceX-Zoo/tree/main/addition_module/DSDG.

[200]  arXiv:2112.00570 [pdf, other]
Title: Toward Foundation Models for Earth Monitoring: Proposal for a Climate Change Benchmark
Subjects: Machine Learning (cs.LG); Geophysics (physics.geo-ph)

Recent progress in self-supervision shows that pre-training large neural networks on vast amounts of unsupervised data can lead to impressive increases in generalisation for downstream tasks. Such models, recently coined as foundation models, have been transformational to the field of natural language processing. While similar models have also been trained on large corpuses of images, they are not well suited for remote sensing data. To stimulate the development of foundation models for Earth monitoring, we propose to develop a new benchmark comprised of a variety of downstream tasks related to climate change. We believe that this can lead to substantial improvements in many existing applications and facilitate the development of new applications. This proposal is also a call for collaboration with the aim of developing a better evaluation process to mitigate potential downsides of foundation models for Earth monitoring.

[201]  arXiv:2112.00574 [pdf, other]
Title: Collective discrete optimisation as judgment aggregation
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Many important collective decision-making problems can be seen as multi-agent versions of discrete optimisation problems. Participatory budgeting, for instance, is the collective version of the knapsack problem; other examples include collective scheduling, and collective spanning trees. Rather than developing a specific model, as well as specific algorithmic techniques, for each of these problems, we propose to represent and solve them in the unifying framework of judgment aggregation with weighted issues. We provide a modular definition of collective discrete optimisation (CDO) rules based on coupling a set scoring function with an operator, and we show how they generalise several existing procedures developed for specific CDO problems. We also give an implementation based on integer linear programming (ILP) and test it on the problem of collective spanning trees.

[202]  arXiv:2112.00578 [pdf, other]
Title: Systematic Generalization with Edge Transformers
Comments: Accepted as a conference paper at NeurIPS 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector states with every edge, that is, with every pair of input nodes -- as opposed to just every node, as it is done in the Transformer model. The second major innovation is a triangular attention mechanism that updates edge representations in a way that is inspired by unification from logic programming. We evaluate Edge Transformer on compositional generalization benchmarks in relational reasoning, semantic parsing, and dependency parsing. In all three settings, the Edge Transformer outperforms Relation-aware, Universal and classical Transformer baselines.

[203]  arXiv:2112.00579 [pdf, other]
Title: Conditional Expectation based Value Decomposition for Scalable On-Demand Ride Pooling
Comments: Preprint. Under Review. arXiv admin note: text overlap with arXiv:1911.08842
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Multiagent Systems (cs.MA)

Owing to the benefits for customers (lower prices), drivers (higher revenues), aggregation companies (higher revenues) and the environment (fewer vehicles), on-demand ride pooling (e.g., Uber pool, Grab Share) has become quite popular. The significant computational complexity of matching vehicles to combinations of requests has meant that traditional ride pooling approaches are myopic in that they do not consider the impact of current matches on future value for vehicles/drivers. Recently, Neural Approximate Dynamic Programming (NeurADP) has employed value decomposition with Approximate Dynamic Programming (ADP) to outperform leading approaches by considering the impact of an individual agent's (vehicle) chosen actions on the future value of that agent. However, in order to ensure scalability and facilitate city-scale ride pooling, NeurADP completely ignores the impact of other agents actions on individual agent/vehicle value. As demonstrated in our experimental results, ignoring the impact of other agents actions on individual value can have a significant impact on the overall performance when there is increased competition among vehicles for demand. Our key contribution is a novel mechanism based on computing conditional expectations through joint conditional probabilities for capturing dependencies on other agents actions without increasing the complexity of training or decision making. We show that our new approach, Conditional Expectation based Value Decomposition (CEVD) outperforms NeurADP by up to 9.76% in terms of overall requests served, which is a significant improvement on a city wide benchmark taxi dataset.

[204]  arXiv:2112.00580 [pdf, other]
Title: Background Activation Suppression for Weakly Supervised Object Localization
Comments: Technical report, Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Weakly supervised object localization (WSOL) aims to localize the object region using only image-level labels as supervision. Recently a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve the localization task. Existing FPM-based methods use cross-entropy (CE) to evaluate the foreground prediction map and to guide the learning of generator. We argue for using activation value to achieve more efficient learning. It is based on the experimental observation that, for a trained network, CE converges to zero when the foreground mask covers only part of the object region. While activation value increases until the mask expands to the object boundary, which indicates that more object areas can be learned by using activation value. In this paper, we propose a Background Activation Suppression (BAS) method. Specifically, an Activation Map Constraint module (AMC) is designed to facilitate the learning of generator by suppressing the background activation values. Meanwhile, by using the foreground region guidance and the area constraint, BAS can learn the whole region of the object. Furthermore, in the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets.

[205]  arXiv:2112.00582 [pdf, other]
Title: Transformer-based Network for RGB-D Saliency Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

RGB-D saliency detection integrates information from both RGB images and depth maps to improve prediction of salient regions under challenging conditions. The key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities. Previous approaches tend to apply the multi-scale and multi-modal fusion separately via local operations, which fails to capture long-range dependencies. Here we propose a transformer-based network to address this issue. Our proposed architecture is composed of two modules: a transformer-based within-modality feature enhancement module (TWFEM) and a transformer-based feature fusion module (TFFM). TFFM conducts a sufficient feature fusion by integrating features from multiple scales and two modalities over all positions simultaneously. TWFEM enhances feature on each scale by selecting and integrating complementary information from other scales within the same modality before TFFM. We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement, and simplifies the model design. Extensive experimental results on six benchmark datasets demonstrate that our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.

[206]  arXiv:2112.00583 [pdf, other]
Title: Meta Arcade: A Configurable Environment Suite for Meta-Learning
Comments: 17 pages, 6 figures, 6 tables, extended version of an accepted paper to NeurIPS DRL Workshop 2021
Subjects: Machine Learning (cs.LG)

Most approaches to deep reinforcement learning (DRL) attempt to solve a single task at a time. As a result, most existing research benchmarks consist of individual games or suites of games that have common interfaces but little overlap in their perceptual features, objectives, or reward structures. To facilitate research into knowledge transfer among trained agents (e.g. via multi-task and meta-learning), more environment suites that provide configurable tasks with enough commonality to be studied collectively are needed. In this paper we present Meta Arcade, a tool to easily define and configure custom 2D arcade games that share common visuals, state spaces, action spaces, game components, and scoring mechanisms. Meta Arcade differs from prior environments in that both task commonality and configurability are prioritized: entire sets of games can be constructed from common elements, and these elements are adjustable through exposed parameters. We include a suite of 24 predefined games that collectively illustrate the possibilities of this framework and discuss how these games can be configured for research applications. We provide several experiments that illustrate how Meta Arcade could be used, including single-task benchmarks of predefined games, sample curriculum-based approaches that change game parameters over a set schedule, and an exploration of transfer learning between games.

[207]  arXiv:2112.00584 [pdf, other]
Title: The Shape Part Slot Machine: Contact-based Reasoning for Generating 3D Shapes from Parts
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We present the Shape Part Slot Machine, a new method for assembling novel 3D shapes from existing parts by performing contact-based reasoning. Our method represents each shape as a graph of "slots," where each slot is a region of contact between two shape parts. Based on this representation, we design a graph-neural-network-based model for generating new slot graphs and retrieving compatible parts, as well as a gradient-descent-based optimization scheme for assembling the retrieved parts into a complete shape that respects the generated slot graph. This approach does not require any semantic part labels; interestingly, it also does not require complete part geometries -- reasoning about the regions where parts connect proves sufficient to generate novel, high-quality 3D shapes. We demonstrate that our method generates shapes that outperform existing modeling-by-assembly approaches in terms of quality, diversity, and structural complexity.

[208]  arXiv:2112.00585 [pdf, other]
Title: Neural Emotion Director: Speech-preserving semantic control of facial expressions in "in-the-wild" videos
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce a novel deep learning method for photo-realistic manipulation of the emotional state of actors in "in-the-wild" videos. The proposed method is based on a parametric 3D face representation of the actor in the input scene that offers a reliable disentanglement of the facial identity from the head pose and facial expressions. It then uses a novel deep domain translation framework that alters the facial expressions in a consistent and plausible manner, taking into account their dynamics. Finally, the altered facial expressions are used to photo-realistically manipulate the facial region in the input scene based on an especially-designed neural face renderer. To the best of our knowledge, our method is the first to be capable of controlling the actor's facial expressions by even using as a sole input the semantic labels of the manipulated emotions, while at the same time preserving the speech-related lip movements. We conduct extensive qualitative and quantitative evaluations and comparisons, which demonstrate the effectiveness of our approach and the especially promising results that we obtain. Our method opens a plethora of new possibilities for useful applications of neural rendering technologies, ranging from movie post-production and video games to photo-realistic affective avatars.

[209]  arXiv:2112.00588 [pdf]
Title: Outlier Detection using AI: A Survey
Comments: Chapter 7 in book: AI Assurance, by Elsevier Academic Press. Edited by: Feras A. Batarseh and Laura Freeman Publication year: 2022
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

An outlier is an event or observation that is defined as an unusual activity, intrusion, or a suspicious data point that lies at an irregular distance from a population. The definition of an outlier event, however, is subjective and depends on the application and the domain (Energy, Health, Wireless Network, etc.). It is important to detect outlier events as carefully as possible to avoid infrastructure failures because anomalous events can cause minor to severe damage to infrastructure. For instance, an attack on a cyber-physical system such as a microgrid may initiate voltage or frequency instability, thereby damaging a smart inverter which involves very expensive repairing. Unusual activities in microgrids can be mechanical faults, behavior changes in the system, human or instrument errors or a malicious attack. Accordingly, and due to its variability, Outlier Detection (OD) is an ever-growing research field. In this chapter, we discuss the progress of OD methods using AI techniques. For that, the fundamental concepts of each OD model are introduced via multiple categories. Broad range of OD methods are categorized into six major categories: Statistical-based, Distance-based, Density-based, Clustering-based, Learning-based, and Ensemble methods. For every category, we discuss recent state-of-the-art approaches, their application areas, and performances. After that, a brief discussion regarding the advantages, disadvantages, and challenges of each technique is provided with recommendations on future research directions. This survey aims to guide the reader to better understand recent progress of OD methods for the assurance of AI.

[210]  arXiv:2112.00590 [pdf, ps, other]
Title: Building astroBERT, a language model for Astronomy & Astrophysics
Subjects: Computation and Language (cs.CL)

The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and more) without further clarification from the user. At ADS, we are applying modern machine learning and natural language processing techniques to our dataset of recent astronomy publications to train astroBERT, a deeply contextual language model based on research at Google. Using astroBERT, we aim to enrich the ADS dataset and improve its discoverability, and in particular we are developing our own named entity recognition tool. We present here our preliminary results and lessons learned.

[211]  arXiv:2112.00591 [pdf]
Title: AI Assurance using Causal Inference: Application to Public Policy
Comments: Chapter 8 in book: AI Assurance, by Elsevier Academic Press. Edited by: Feras A. Batarseh and Laura Freeman Publication year: 2022
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Computer Science and Game Theory (cs.GT); Numerical Analysis (math.NA)

Developing and implementing AI-based solutions help state and federal government agencies, research institutions, and commercial companies enhance decision-making processes, automate chain operations, and reduce the consumption of natural and human resources. At the same time, most AI approaches used in practice can only be represented as "black boxes" and suffer from the lack of transparency. This can eventually lead to unexpected outcomes and undermine trust in such systems. Therefore, it is crucial not only to develop effective and robust AI systems, but to make sure their internal processes are explainable and fair. Our goal in this chapter is to introduce the topic of designing assurance methods for AI systems with high-impact decisions using the example of the technology sector of the US economy. We explain how these fields would benefit from revealing cause-effect relationships between key metrics in the dataset by providing the causal experiment on technology economics dataset. Several causal inference approaches and AI assurance techniques are reviewed and the transformation of the data into a graph-structured dataset is demonstrated.

[212]  arXiv:2112.00592 [pdf, other]
Title: BeamSync: Over-The-Air Carrier Synchronization in Distributed RadioWeaves
Comments: 6 pages, 6 figures. Accepted in 25th International ITG Workshop on Smart Antennas (WSA 2021)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In a distributed multi-antenna system, multiple geographically separated transmit nodes communicate simultaneously to a receive node. Synchronization of these nodes is essential to achieve a good performance at the receiver. RadioWeaves is a new paradigm of cell-free massive MIMO array deployment using distributed multi-antenna panels in indoor environments. In this paper, we study the carrier frequency synchronization problem in distributed RadioWeave panels. We propose a novel, over-the-air synchronization protocol, which we call as BeamSync, to synchronize all the different multi-antenna transmit panels. We also show that beamforming the synchronization signal in the dominant direction of the channel between the panels is optimal and the synchronization performance is significantly better than traditional beamforming techniques.

[213]  arXiv:2112.00597 [pdf, other]
Title: Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation
Subjects: Robotics (cs.RO); Machine Learning (stat.ML)

Complex sequential tasks in continuous-control settings often require agents to successfully traverse a set of "narrow passages" in their state space. Solving such tasks with a sparse reward in a sample-efficient manner poses a challenge to modern reinforcement learning (RL) due to the associated long-horizon nature of the problem and the lack of sufficient positive signal during learning. Various tools have been applied to address this challenge. When available, large sets of demonstrations can guide agent exploration. Hindsight relabelling on the other hand does not require additional sources of information. However, existing strategies explore based on task-agnostic goal distributions, which can render the solution of long-horizon tasks impractical. In this work, we extend hindsight relabelling mechanisms to guide exploration along task-specific distributions implied by a small set of successful demonstrations. We evaluate the approach on four complex, single and dual arm, robotics manipulation tasks against strong suitable baselines. The method requires far fewer demonstrations to solve all tasks and achieves a significantly higher overall performance as task complexity increases. Finally, we investigate the robustness of the proposed solution with respect to the quality of input representations and the number of demonstrations.

[214]  arXiv:2112.00599 [pdf, other]
Title: An implementation of the "Guess who?" game using CLIP
Comments: Code available at this https URL
Journal-ref: Intelligent Data Engineering and Automated Learning (IDEAL 2021). Lecture Notes in Computer Science, vol 13113
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

CLIP (Contrastive Language-Image Pretraining) is an efficient method for learning computer vision tasks from natural language supervision that has powered a recent breakthrough in deep learning due to its zero-shot transfer capabilities. By training from image-text pairs available on the internet, the CLIP model transfers non-trivially to most tasks without the need for any data set specific training. In this work, we use CLIP to implement the engine of the popular game "Guess who?", so that the player interacts with the game using natural language prompts and CLIP automatically decides whether an image in the game board fulfills that prompt or not. We study the performance of this approach by benchmarking on different ways of prompting the questions to CLIP, and show the limitations of its zero-shot capabilites.

[215]  arXiv:2112.00600 [pdf, other]
Title: Towards Futuristic Autonomous Experimentation--A Surprise-Reacting Sequential Experiment Policy
Subjects: Machine Learning (cs.LG)

An autonomous experimentation platform in manufacturing is supposedly capable of conducting a sequential search for finding suitable manufacturing conditions for advanced materials by itself or even for discovering new materials with minimal human intervention. The core of the intelligent control of such platforms is the policy directing sequential experiments, namely, to decide where to conduct the next experiment based on what has been done thus far. Such policy inevitably trades off exploitation versus exploration and the current practice is under the Bayesian optimization framework using the expected improvement criterion or its variants. We discuss whether it is beneficial to trade off exploitation versus exploration by measuring the element and degree of surprise associated with the immediate past observation. We devise a surprise-reacting policy using two existing surprise metrics, known as the Shannon surprise and Bayesian surprise. Our analysis shows that the surprise-reacting policy appears to be better suited for quickly characterizing the overall landscape of a response surface or a design place under resource constraints. We argue that such capability is much needed for futuristic autonomous experimentation platforms. We do not claim that we have a fully autonomous experimentation platform, but believe that our current effort sheds new lights or provides a different view angle as researchers are racing to elevate the autonomy of various primitive autonomous experimentation systems.

[216]  arXiv:2112.00604 [pdf, other]
Title: Near-Optimal Distributed Degree+1 Coloring
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

We present a new approach to randomized distributed graph coloring that is simpler and more efficient than previous ones. In particular, it allows us to tackle the $(\operatorname{deg}+1)$-list-coloring (D1LC) problem, where each node $v$ of degree $d_v$ is assigned a palette of $d_v+1$ colors, and the objective is to find a proper coloring using these palettes. While for $(\Delta+1)$-coloring (where $\Delta$ is the maximum degree), there is a fast randomized distributed $O(\log^3\log n)$-round algorithm (Chang, Li, and Pettie [SIAM J. Comp. 2020]), no $o(\log n)$-round algorithms are known for the D1LC problem.
We give a randomized distributed algorithm for D1LC that is optimal under plausible assumptions about the deterministic complexity of the problem. Using the recent deterministic algorithm of Ghaffari and Kuhn [FOCS2021], our algorithm runs in $O(\log^3 \log n)$ time, matching the best bound known for $(\Delta+1)$-coloring. In addition, it colors all nodes of degree $\Omega(\log^7 n)$ in $O(\log^* n)$ rounds.
A key contribution is a subroutine to generate slack for D1LC. When placed into the framework of Assadi, Chen, and Khanna [SODA2019] and Alon and Assadi [APPROX/RANDOM2020], this almost immediately leads to a palette sparsification theorem for D1LC, generalizing previous results. That gives fast algorithms for D1LC in three different models: an $O(1)$-round algorithm in the MPC model with $\tilde{O}(n)$ memory per machine; a single-pass semi-streaming algorithm in dynamic streams; and an $\tilde{O}(n\sqrt{n})$-time algorithm in the standard query model.

[217]  arXiv:2112.00616 [pdf, other]
Title: Roadmap for Edge AI: A Dagstuhl Perspective
Comments: for ACM SIGCOMM CCR
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

Based on the collective input of Dagstuhl Seminar (21342), this paper presents a comprehensive discussion on AI methods and capabilities in the context of edge computing, referred as Edge AI. In a nutshell, we envision Edge AI to provide adaptation for data-driven applications, enhance network and radio access, and allow the creation, optimization, and deployment of distributed AI/ML pipelines with given quality of experience, trust, security and privacy targets. The Edge AI community investigates novel ML methods for the edge computing environment, spanning multiple sub-fields of computer science, engineering and ICT. The goal is to share an envisioned roadmap that can bring together key actors and enablers to further advance the domain of Edge AI.

[218]  arXiv:2112.00619 [pdf, other]
Title: Edge computing for cyber-physical systems: A systematic mapping study emphasizing trustworthiness
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

Edge computing is projected to have profound implications in the coming decades, proposed to provide solutions for applications such as augmented reality, predictive functionalities, and collaborative Cyber-Physical Systems (CPS). For such applications, edge computing addresses the new computational needs, as well as privacy, availability, and real-time constraints, by providing local high-performance computing capabilities to deal with the limitations and constraints of cloud and embedded systems. Our interests lie in the applications of edge computing as part of CPS, where several properties (or attributes) of trustworthiness, including safety, security, and predictability/availability are of particular concern, each facing challenges for the introduction of edge-based CPS. We present the results of a systematic mapping study, a kind of systematic literature survey, investigating the use of edge computing for CPS with a special emphasis on trustworthiness. The main contributions of this study are a detailed description of the current research efforts in edge-based CPS and the identification and discussion of trends and research gaps. The results show that the main body of research in edge-based CPS only to a very limited extent consider key attributes of system trustworthiness, despite many efforts referring to critical CPS and applications like intelligent transportation. More research and industrial efforts will be needed on aspects of trustworthiness of future edge-based CPS including their experimental evaluation. Such research needs to consider the multiple interrelated attributes of trustworthiness including safety, security, and predictability, and new methodologies and architectures to address them. It is further important to provide bridges and collaboration between edge computing and CPS disciplines.

[219]  arXiv:2112.00621 [pdf, other]
Title: A Two-Level Approximate Logic Synthesis Combining Cube Insertion and Removal
Comments: 5 Pages, submitted to IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Subjects: Other Computer Science (cs.OH); Hardware Architecture (cs.AR); Logic in Computer Science (cs.LO)

Approximate computing is an attractive paradigm for reducing the design complexity of error-resilient systems, therefore improving performance and saving power consumption. In this work, we propose a new two-level approximate logic synthesis method based on cube insertion and removal procedures. Experimental results have shown significant literal count and runtime reduction compared to the state-of-the-art approach. The method scalability is illustrated for a high error threshold over large benchmark circuits. The obtained solutions have presented a literal number reduction up to 38%, 56% and 93% with respect to an error rate of 1%, 3% and 5%, respectively.

[220]  arXiv:2112.00626 [pdf, other]
Title: The Effect of People Recommenders on Echo Chambers and Polarization
Comments: To appear in: Proceedings of the International AAAI Conference on Web and Social Media, vol. 16 (ICWSM '22)
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

The effects of social media on critical issues, such as polarization and misinformation, are under scrutiny due to the disruptive consequences that these phenomena can have on our societies. Among the algorithms routinely used by social media platforms, people-recommender systems are of special interest, as they directly contribute to the evolution of the social network structure, affecting the information and the opinions users are exposed to.
In this paper, we propose a framework to assess the effect of people recommenders on the evolution of opinions. Our proposal is based on Monte Carlo simulations combining link recommendation and opinion-dynamics models. In order to control initial conditions, we define a random network model to generate graphs with opinions, with tunable amounts of modularity and homophily. We join these elements into a methodology to study the effects of the recommender system on echo chambers and polarization. We also show how to use our framework to measure, by means of simulations, the impact of different intervention strategies.
Our thorough experimentation shows that people recommenders can in fact lead to a significant increase in echo chambers. However, this happens only if there is considerable initial homophily in the network. Also, we find that if the network already contains echo chambers, the effect of the recommendation algorithm is negligible. Such findings are robust to two very different opinion dynamics models, a bounded confidence model and an epistemological model.

[221]  arXiv:2112.00627 [pdf, other]
Title: DeepSportLab: a Unified Framework for Ball Detection, Player Instance Segmentation and Pose Estimation in Team Sports Scenes
Comments: 13 pages, 5 figures, BMVC, BMVC2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a unified framework to (i) locate the ball, (ii) predict the pose, and (iii) segment the instance mask of players in team sports scenes. Those problems are of high interest in automated sports analytics, production, and broadcast. A common practice is to individually solve each problem by exploiting universal state-of-the-art models, \eg, Panoptic-DeepLab for player segmentation. In addition to the increased complexity resulting from the multiplication of single-task models, the use of the off-the-shelf models also impedes the performance due to the complexity and specificity of the team sports scenes, such as strong occlusion and motion blur. To circumvent those limitations, our paper proposes to train a single model that simultaneously predicts the ball and the player mask and pose by combining the part intensity fields and the spatial embeddings principles. Part intensity fields provide the ball and player location, as well as player joints location. Spatial embeddings are then exploited to associate player instance pixels to their respective player center, but also to group player joints into skeletons. We demonstrate the effectiveness of the proposed model on the DeepSport basketball dataset, achieving comparable performance to the SoA models addressing each individual task separately.

[222]  arXiv:2112.00629 [pdf, other]
Title: Classifying grounded intersection graphs via ordered forbidden patterns
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)

It was noted already in the 90s that many classic graph classes, such as interval, chordal, and bipartite graphs, can be characterized by the existence of an ordering of the vertices avoiding some ordered subgraphs, called \emph{patterns}. Very recently, all the classes corresponding to patterns on three vertices (including the ones mentioned above) have been listed, and proved to be efficiently recognizable. In contrast, very little is known about patterns on four vertices.
One of the few graph classes characterized by a pattern on four vertices is the class of intersection graphs of rectangles that are said to be \emph{grounded on a line}. This class appears naturally in the study of intersection graphs, and similar grounded classes have recently attracted a lot of attention.
This paper contains three parts. First, we make a survey of grounded intersection graph classes, summarizing all the known inclusions between these various classes. Second, we show that the correspondence between a pattern on four vertices and grounded rectangle graphs is not an isolated phenomenon. We establish several other pattern characterizations for geometric classes, and show that the hierarchy of grounded intersection graph classes is tightly interleaved with the classes defined patterns on four vertices. We claim that forbidden patterns are a useful tool to classify grounded intersection graphs. Finally, we give an overview of the complexity of the recognition of classes defined by forbidden patterns on four vertices and list several interesting open problems.

[223]  arXiv:2112.00633 [pdf, other]
Title: TEDGE-Caching: Transformer-based Edge Caching Towards 6G Networks
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Signal Processing (eess.SP)

As a consequence of the COVID-19 pandemic, the demand for telecommunication for remote learning/working and telemedicine has significantly increased. Mobile Edge Caching (MEC) in the 6G networks has been evolved as an efficient solution to meet the phenomenal growth of the global mobile data traffic by bringing multimedia content closer to the users. Although massive connectivity enabled by MEC networks will significantly increase the quality of communications, there are several key challenges ahead. The limited storage of edge nodes, the large size of multimedia content, and the time-variant users' preferences make it critical to efficiently and dynamically predict the popularity of content to store the most upcoming requested ones before being requested. Recent advancements in Deep Neural Networks (DNNs) have drawn much research attention to predict the content popularity in proactive caching schemes. Existing DNN models in this context, however, suffer from longterm dependencies, computational complexity, and unsuitability for parallel computing. To tackle these challenges, we propose an edge caching framework incorporated with the attention-based Vision Transformer (ViT) neural network, referred to as the Transformer-based Edge (TEDGE) caching, which to the best of our knowledge, is being studied for the first time. Moreover, the TEDGE caching framework requires no data pre-processing and additional contextual information. Simulation results corroborate the effectiveness of the proposed TEDGE caching framework in comparison to its counterparts.

[224]  arXiv:2112.00639 [pdf, other]
Title: Robustness in Deep Learning for Computer Vision: Mind the gap?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Deep neural networks for computer vision tasks are deployed in increasingly safety-critical and socially-impactful applications, motivating the need to close the gap in model performance under varied, naturally occurring imaging conditions. Robustness, ambiguously used in multiple contexts including adversarial machine learning, here then refers to preserving model performance under naturally-induced image corruptions or alterations.
We perform a systematic review to identify, analyze, and summarize current definitions and progress towards non-adversarial robustness in deep learning for computer vision. We find that this area of research has received disproportionately little attention relative to adversarial machine learning, yet a significant robustness gap exists that often manifests in performance degradation similar in magnitude to adversarial conditions.
To provide a more transparent definition of robustness across contexts, we introduce a structural causal model of the data generating process and interpret non-adversarial robustness as pertaining to a model's behavior on corrupted images which correspond to low-probability samples from the unaltered data distribution. We then identify key architecture-, data augmentation-, and optimization tactics for improving neural network robustness. This causal view of robustness reveals that common practices in the current literature, both in regards to robustness tactics and evaluations, correspond to causal concepts, such as soft interventions resulting in a counterfactually-altered distribution of imaging conditions. Through our findings and analysis, we offer perspectives on how future research may mind this evident and significant non-adversarial robustness gap.

[225]  arXiv:2112.00646 [pdf, other]
Title: Reliability Assessment and Safety Arguments for Machine Learning Components in Assuring Learning-Enabled Autonomous Systems
Comments: Submitted, under review
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

The increasing use of Machine Learning (ML) components embedded in autonomous systems -- so-called Learning-Enabled Systems (LES) -- has resulted in the pressing need to assure their functional safety. As for traditional functional safety, the emerging consensus within both, industry and academia, is to use assurance cases for this purpose. Typically assurance cases support claims of reliability in support of safety, and can be viewed as a structured way of organising arguments and evidence generated from safety analysis and reliability modelling activities. While such assurance activities are traditionally guided by consensus-based standards developed from vast engineering experience, LES pose new challenges in safety-critical application due to the characteristics and design of ML models. In this article, we first present an overall assurance framework for LES with an emphasis on quantitative aspects, e.g., breaking down system-level safety targets to component-level requirements and supporting claims stated in reliability metrics. We then introduce a novel model-agnostic Reliability Assessment Model (RAM) for ML classifiers that utilises the operational profile and robustness verification evidence. We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM and propose practical solutions. Probabilistic safety arguments at the lower ML component-level are also developed based on the RAM. Finally, to evaluate and demonstrate our methods, we not only conduct experiments on synthetic/benchmark datasets but also demonstrate the scope of our methods with a comprehensive case study on Autonomous Underwater Vehicles in simulation.

[226]  arXiv:2112.00648 [pdf, ps, other]
Title: Remixing Functionally Graded Structures: Data-Driven Topology Optimization with Multiclass Shape Blending
Comments: Submitted to Structural and Multidisciplinary Optimization: Selected papers from WCSMO-14
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

To create heterogeneous, multiscale structures with unprecedented functionalities, recent topology optimization approaches design either fully aperiodic systems or functionally graded structures, which compete in terms of design freedom and efficiency. We propose to inherit the advantages of both through a data-driven framework for multiclass functionally graded structures that mixes several families, i.e., classes, of microstructure topologies to create spatially-varying designs with guaranteed feasibility. The key is a new multiclass shape blending scheme that generates smoothly graded microstructures without requiring compatible classes or connectivity and feasibility constraints. Moreover, it transforms the microscale problem into an efficient, low-dimensional one without confining the design to predefined shapes. Compliance and shape matching examples using common truss geometries and diversity-based freeform topologies demonstrate the versatility of our framework, while studies on the effect of the number and diversity of classes illustrate the effectiveness. The generality of the proposed methods supports future extensions beyond the linear applications presented.

[227]  arXiv:2112.00649 [pdf]
Title: Digital Twinning Remote Laboratories for Online Practical Learning
Comments: 56 pages, 14 figures. arXiv admin note: text overlap with arXiv:2106.09344
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

The COVID19 pandemic has demonstrated a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. It is also costly to employ teams of system analysts, developers and 3D artists. There is a requirement to provide a simple method to enable lecturers to create their own content for their laboratory tutorials. Research has been undertaken into developing generic models to enable the semi-automatic creation of a virtual learning tools for subjects that require practical interactions with the lab resources. In addition to the system for creating digital twins, a case study describing the creation of a virtual learning application for an electrical laboratory tutorial has been presented.

[228]  arXiv:2112.00653 [pdf, other]
Title: Variational Learning for Unsupervised Knowledge Grounded Dialogs
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent methods for knowledge grounded dialogs generate responses by incorporating information from an external textual document. These methods do not require the exact document to be known during training and rely on the use of a retrieval system to fetch relevant documents from a large index. The documents used to generate the responses are modeled as latent variables whose prior probabilities need to be estimated. Models such as RAG , marginalize the document probabilities over the documents retrieved from the index to define the log likelihood loss function which is optimized end-to-end.
In this paper, we develop a variational approach to the above technique wherein, we instead maximize the Evidence Lower bound (ELBO). Using a collection of three publicly available open-conversation datasets, we demonstrate how the posterior distribution, that has information from the ground-truth response, allows for a better approximation of the objective function during training. To overcome the challenges associated with sampling over a large knowledge collection, we develop an efficient approach to approximate the ELBO. To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems.

[229]  arXiv:2112.00654 [pdf]
Title: Siamese Neural Encoders for Long-Term Indoor Localization with Mobile Devices
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Fingerprinting-based indoor localization is an emerging application domain for enhanced positioning and tracking of people and assets within indoor locales. The superior pairing of ubiquitously available WiFi signals with computationally capable smartphones is set to revolutionize the area of indoor localization. However, the observed signal characteristics from independently maintained WiFi access points vary greatly over time. Moreover, some of the WiFi access points visible at the initial deployment phase may be replaced or removed over time. These factors are often ignored in indoor localization frameworks and cause gradual and catastrophic degradation of localization accuracy post-deployment (over weeks and months). To overcome these challenges, we propose a Siamese neural encoder-based framework that offers up to 40% reduction in degradation of localization accuracy over time compared to the state-of-the-art in the area, without requiring any retraining.

[230]  arXiv:2112.00655 [pdf, ps, other]
Title: Efficient and Local Parallel Random Walks
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

Random walks are a fundamental primitive used in many machine learning algorithms with several applications in clustering and semi-supervised learning. Despite their relevance, the first efficient parallel algorithm to compute random walks has been introduced very recently (Lacki et al.). Unfortunately their method has a fundamental shortcoming: their algorithm is non-local in that it heavily relies on computing random walks out of all nodes in the input graph, even though in many practical applications one is interested in computing random walks only from a small subset of nodes in the graph. In this paper, we present a new algorithm that overcomes this limitation by building random walk efficiently and locally at the same time. We show that our technique is both memory and round efficient, and in particular yields an efficient parallel local clustering algorithm. Finally, we complement our theoretical analysis with experimental results showing that our algorithm is significantly more scalable than previous approaches.

[231]  arXiv:2112.00656 [pdf, other]
Title: Object-aware Video-language Pre-training for Retrieval
Comments: 10 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Recently, by introducing large-scale dataset and strong transformer network, video-language pre-training has shown great success especially for retrieval. Yet, existing video-language transformer models do not explicitly fine-grained semantic align. In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations. The key idea is to leverage the bounding boxes and object tags to guide the training process. We evaluate our model on three standard sub-tasks of video-text matching on four widely used benchmarks. We also provide deep analysis and detailed ablation about the proposed method. We show clear improvement in performance across all tasks and datasets considered, demonstrating the value of a model that incorporates object representations into a video-language architecture. The code will be released at \url{https://github.com/FingerRec/OA-Transformer}.

[232]  arXiv:2112.00659 [pdf, other]
Title: Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines
Comments: 21 pages, 15 figures, and 9 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Certified robustness guarantee gauges a model's robustness to test-time attacks and can assess the model's readiness for deployment in the real world. In this work, we critically examine how the adversarial robustness guarantees from randomized smoothing-based certification methods change when state-of-the-art certifiably robust models encounter out-of-distribution (OOD) data. Our analysis demonstrates a previously unknown vulnerability of these models to low-frequency OOD data such as weather-related corruptions, rendering these models unfit for deployment in the wild. To alleviate this issue, we propose a novel data augmentation scheme, FourierMix, that produces augmentations to improve the spectral coverage of the training data. Furthermore, we propose a new regularizer that encourages consistent predictions on noise perturbations of the augmented data to improve the quality of the smoothed models. We find that FourierMix augmentations help eliminate the spectral bias of certifiably robust models enabling them to achieve significantly better robustness guarantees on a range of OOD benchmarks. Our evaluation also uncovers the inability of current OOD benchmarks at highlighting the spectral biases of the models. To this end, we propose a comprehensive benchmarking suite that contains corruptions from different regions in the spectral domain. Evaluation of models trained with popular augmentation methods on the proposed suite highlights their spectral biases and establishes the superiority of FourierMix trained models at achieving better-certified robustness guarantees under OOD shifts over the entire frequency spectrum.

[233]  arXiv:2112.00661 [pdf]
Title: MOMO -- Deep Learning-driven classification of external DICOM studies for PACS archivation
Comments: 21 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Information Retrieval (cs.IR)

Patients regularly continue assessment or treatment in other facilities than they began them in, receiving their previous imaging studies as a CD-ROM and requiring clinical staff at the new hospital to import these studies into their local database. However, between different facilities, standards for nomenclature, contents, or even medical procedures may vary, often requiring human intervention to accurately classify the received studies in the context of the recipient hospital's standards. In this study, the authors present MOMO (MOdality Mapping and Orchestration), a deep learning-based approach to automate this mapping process utilizing metadata substring matching and a neural network ensemble, which is trained to recognize the 76 most common imaging studies across seven different modalities. A retrospective study is performed to measure the accuracy that this algorithm can provide. To this end, a set of 11,934 imaging series with existing labels was retrieved from the local hospital's PACS database to train the neural networks. A set of 843 completely anonymized external studies was hand-labeled to assess the performance of our algorithm. Additionally, an ablation study was performed to measure the performance impact of the network ensemble in the algorithm, and a comparative performance test with a commercial product was conducted. In comparison to a commercial product (96.20% predictive power, 82.86% accuracy, 1.36% minor errors), a neural network ensemble alone performs the classification task with less accuracy (99.05% predictive power, 72.69% accuracy, 10.3% minor errors). However, MOMO outperforms either by a large margin in accuracy and with increased predictive power (99.29% predictive power, 92.71% accuracy, 2.63% minor errors).

[234]  arXiv:2112.00662 [pdf, other]
Title: A general locomotion control framework for serially connected multi-legged robots
Subjects: Robotics (cs.RO)

Serially connected robots are promising candidates for performing tasks in confined spaces such as search-and-rescue in large-scale disasters. Such robots are typically limbless, and we hypothesize that the addition of limbs could improve mobility. However, a challenge in designing and controlling such devices lies in the coordination of high-dimensional redundant modules in a way that improves mobility. Here we develop a general framework to control serially connected multi-legged robots. Specifically, we combine two approaches to build a general shape control scheme which can provide baseline patterns of self-deformation ("gaits") for effective locomotion in diverse robot morphologies. First, we take inspiration from a dimensionality reduction and a biological gait classification scheme to generate cyclic patterns of body deformation and foot lifting/lowering, which facilitate generation of arbitrary substrate contact patterns. Second, we use geometric mechanics methods to facilitates identification of optimal phasing of these undulations to maximize speed and/or stability. Our scheme allows the development of effective gaits in multi-legged robots locomoting on flat frictional terrain with diverse number of limbs (4, 6, 16, and even 0 limbs) and body actuation capabilities (including sidewinding gaits on limbless devices). By properly coordinating the body undulation and the leg placement, our framework combines the advantages of both limbless robots (modularity) and legged robots (mobility). We expect that our framework can provide general control schemes for the rapid deployment of general multi-legged robots, paving the ways toward machines that can traverse complex environments under real-life conditions.

[235]  arXiv:2112.00663 [pdf, other]
Title: Graph Conditioned Sparse-Attention for Improved Source Code Understanding
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL)

Transformer architectures have been successfully used in learning source code representations. The fusion between a graph representation like Abstract Syntax Tree (AST) and a source code sequence makes the use of current approaches computationally intractable for large input sequence lengths. Source code can have long-range dependencies that require larger sequence lengths to model effectively. Current approaches have a quadratic growth in computational and memory costs with respect to the sequence length. Using such models in practical scenarios is difficult. In this work, we propose the conditioning of a source code snippet with its graph modality by using the graph adjacency matrix as an attention mask for a sparse self-attention mechanism and the use of a graph diffusion mechanism to model longer-range token dependencies. Our model reaches state-of-the-art results in BLEU, METEOR, and ROUGE-L metrics for the code summarization task and near state-of-the-art accuracy in the variable misuse task. The memory use and inference time of our model have linear growth with respect to the input sequence length as compared to the quadratic growth of previous works.

[236]  arXiv:2112.00665 [pdf, other]
Title: Saliency Enhancement using Superpixel Similarity
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Saliency Object Detection (SOD) has several applications in image analysis. Deep-learning-based SOD methods are among the most effective, but they may miss foreground parts with similar colors. To circumvent the problem, we introduce a post-processing method, named \textit{Saliency Enhancement over Superpixel Similarity} (SESS), which executes two operations alternately for saliency completion: object-based superpixel segmentation and superpixel-based saliency estimation. SESS uses an input saliency map to estimate seeds for superpixel delineation and define superpixel queries in foreground and background. A new saliency map results from color similarities between queries and superpixels. The process repeats for a given number of iterations, such that all generated saliency maps are combined into a single one by cellular automata. Finally, post-processed and initial maps are merged using their average values per superpixel. We demonstrate that SESS can consistently and considerably improve the results of three deep-learning-based SOD methods on five image datasets.

[237]  arXiv:2112.00668 [pdf, other]
Title: A Few-Shot Meta-Learning based Siamese Neural Network using Entropy Features for Ransomware Classification
Subjects: Cryptography and Security (cs.CR)

Ransomware defense solutions that can quickly detect and classify different ransomware classes to formulate rapid response plans have been in high demand in recent years. Though the applicability of adopting deep learning techniques to provide automation and self-learning provision has been proven in many application domains, the lack of data available for ransomware (and other malware)samples has been raised as a barrier to developing effective deep learning-based solutions. To address this concern, we propose a few-shot meta-learning based Siamese Neural Network that not only detects ransomware attacks but is able to classify them into different classes. Our proposed model utilizes the entropy feature directly extracted from ransomware binary files to retain more fine-grained features associated with different ransomware signatures. These entropy features are used further to train and optimize our model using a pre-trained network (e.g. VGG-16) in a meta-learning fashion. This approach generates more accurate weight factors, compared to feature images are used, to avoid the bias typically associated with a model trained with a limited number of training samples. Our experimental results show that our proposed model is highly effective in providing a weighted F1-score exceeding the rate>86% compared

[238]  arXiv:2112.00673 [pdf, other]
Title: Robustly Self-Ordered Graphs: Constructions and Applications to Property Testing
Comments: Slightly modified version of a CCC 2021 paper that also appeared on ECCC 27: 149 (2020)
Subjects: Computational Complexity (cs.CC)

A graph $G$ is called self-ordered (a.k.a asymmetric) if the identity permutation is its only automorphism. Equivalently, there is a unique isomorphism from $G$ to any graph that is isomorphic to $G$. We say that $G=(V,E)$ is robustly self-ordered if the size of the symmetric difference between $E$ and the edge-set of the graph obtained by permuting $V$ using any permutation $\pi:V\to V$ is proportional to the number of non-fixed-points of $\pi$. In this work, we initiate the study of the structure, construction and utility of robustly self-ordered graphs.
We show that robustly self-ordered bounded-degree graphs exist (in abundance), and that they can be constructed efficiently, in a strong sense. Specifically, given the index of a vertex in such a graph, it is possible to find all its neighbors in polynomial-time (i.e., in time that is poly-logarithmic in the size of the graph).
We also consider graphs of unbounded degree, seeking correspondingly unbounded robustness parameters. We again demonstrate that such graphs (of linear degree) exist (in abundance), and that they can be constructed efficiently, in a strong sense. This turns out to require very different tools. Specifically, we show that the construction of such graphs reduces to the construction of non-malleable two-source extractors (with very weak parameters but with some additional natural features).
We demonstrate that robustly self-ordered bounded-degree graphs are useful towards obtaining lower bounds on the query complexity of testing graph properties both in the bounded-degree and the dense graph models. One of the results that we obtain, via such a reduction, is a subexponential separation between the query complexities of testing and tolerant testing of graph properties in the bounded-degree graph model.

[239]  arXiv:2112.00682 [pdf, ps, other]
Title: Quasi-3D Magneto-Thermal Quench Simulation Scheme for Superconducting Accelerator Magnets
Comments: 5 pages, 8 figures, MT27 conference special issue paper
Subjects: Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)

To tackle the strong multi-scale problem in the quench simulation of superconducting accelerator magnets, this work proposes a hybrid numerical method which uses two-dimensional first-order finite-elements in the magnet cross-section and one-dimensional higher-order orthogonal polynomials in longitudinal direction.

[240]  arXiv:2112.00686 [pdf, other]
Title: CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Can deep learning models achieve greater generalization if their training is guided by reference to human perceptual abilities? And how can we implement this in a practical manner? This paper proposes a first-ever training strategy to ConveY Brain Oversight to Raise Generalization (CYBORG). This new training approach incorporates human-annotated saliency maps into a CYBORG loss function that guides the model towards learning features from image regions that humans find salient when solving a given visual task. The Class Activation Mapping (CAM) mechanism is used to probe the model's current saliency in each training batch, juxtapose model saliency with human saliency, and penalize the model for large differences. Results on the task of synthetic face detection show that the CYBORG loss leads to significant improvement in performance on unseen samples consisting of face images generated from six Generative Adversarial Networks (GANs) across multiple classification network architectures. We also show that scaling to even seven times as much training data with standard loss cannot beat the accuracy of CYBORG loss. As a side effect, we observed that the addition of explicit region annotation to the task of synthetic face detection increased human classification performance. This work opens a new area of research on how to incorporate human visual saliency into loss functions. All data, code and pre-trained models used in this work are offered with this paper.

[241]  arXiv:2112.00690 [pdf, other]
Title: MDFM: Multi-Decision Fusing Model for Few-Shot Learning
Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT). arXiv admin note: text overlap with arXiv:2109.07785
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In recent years, researchers pay growing attention to the few-shot learning (FSL) task to address the data-scarce problem. A standard FSL framework is composed of two components: i) Pre-train. Employ the base data to generate a CNN-based feature extraction model (FEM). ii) Meta-test. Apply the trained FEM to the novel data (category is different from base data) to acquire the feature embeddings and recognize them. Although researchers have made remarkable breakthroughs in FSL, there still exists a fundamental problem. Since the trained FEM with base data usually cannot adapt to the novel class flawlessly, the novel data's feature may lead to the distribution shift problem. To address this challenge, we hypothesize that even if most of the decisions based on different FEMs are viewed as \textit{weak decisions}, which are not available for all classes, they still perform decently in some specific categories. Inspired by this assumption, we propose a novel method Multi-Decision Fusing Model (MDFM), which comprehensively considers the decisions based on multiple FEMs to enhance the efficacy and robustness of the model. MDFM is a simple, flexible, non-parametric method that can directly apply to the existing FEMs. Besides, we extend the proposed MDFM to two FSL settings (i.e., supervised and semi-supervised settings). We evaluate the proposed method on five benchmark datasets and achieve significant improvements of 3.4%-7.3\% compared with state-of-the-arts.

[242]  arXiv:2112.00694 [pdf, other]
Title: Label-Free Model Evaluation with Semi-Structured Dataset Representations
Comments: 10 pages, 8 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Label-free model evaluation, or AutoEval, estimates model accuracy on unlabeled test sets, and is critical for understanding model behaviors in various unseen environments. In the absence of image labels, based on dataset representations, we estimate model performance for AutoEval with regression. On the one hand, image feature is a straightforward choice for such representations, but it hampers regression learning due to being unstructured (\ie no specific meanings for component at certain location) and of large-scale. On the other hand, previous methods adopt simple structured representations (like average confidence or average feature), but insufficient to capture the data characteristics given their limited dimensions. In this work, we take the best of both worlds and propose a new semi-structured dataset representation that is manageable for regression learning while containing rich information for AutoEval. Based on image features, we integrate distribution shapes, clusters, and representative samples for a semi-structured dataset representation. Besides the structured overall description with distribution shapes, the unstructured description with clusters and representative samples include additional fine-grained information facilitating the AutoEval task. On three existing datasets and 25 newly introduced ones, we experimentally show that the proposed representation achieves competitive results. Code and dataset are available at https://github.com/sxzrt/Semi-Structured-Dataset-Representations.

[243]  arXiv:2112.00698 [pdf, ps, other]
Title: CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems
Comments: 5 pages, 3 figures, published in an IEEE Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Due to the advent of modern embedded systems and mobile devices with constrained resources, there is a great demand for incredibly efficient deep neural networks for machine learning purposes. There is also a growing concern of privacy and confidentiality of user data within the general public when their data is processed and stored in an external server which has further fueled the need for developing such efficient neural networks for real-time inference on local embedded systems. The scope of our work presented in this paper is limited to image classification using a convolutional neural network. A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor, designed to extract information and convert it into meaningful representations for real-time inference of the input data. In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems. We show that this architecture, dubbed CondenseNeXt, is remarkably efficient in comparison to the baseline neural network architecture, CondenseNet, by reducing trainable parameters and FLOPs required to train the network whilst maintaining a balance between the trained model size of less than 3.0 MB and accuracy trade-off resulting in an unprecedented computational efficiency.

[244]  arXiv:2112.00699 [pdf]
Title: BERT_SE: A Pre-trained Language Representation Model for Software Engineering
Journal-ref: In D. C. Wyld and D. Nagamalai (Ed.), NLPTA 2021. v.11, n.19, 2021, 115-130
Subjects: Software Engineering (cs.SE)

The application of Natural Language Processing (NLP) has achieved a high level of relevance in several areas. In the field of software engineering (SE), NLP applications are based on the classification of similar texts (e.g. software requirements), applied in tasks of estimating software effort, selection of human resources, etc. Classifying software requirements has been a complex task, considering the informality and complexity inherent in the texts produced during the software development process. The pre-trained embedding models are shown as a viable alternative when considering the low volume of textual data labeled in the area of software engineering, as well as the lack of quality of these data. Although there is much research around the application of word embedding in several areas, to date, there is no knowledge of studies that have explored its application in the creation of a specific model for the domain of the SE area. Thus, this article presents the proposal for a contextualized embedding model, called BERT_SE, which allows the recognition of specific and relevant terms in the context of SE. The assessment of BERT_SE was performed using the software requirements classification task, demonstrating that this model has an average improvement rate of 13% concerning the BERT_base model, made available by the authors of BERT. The code and pre-trained models are available at https://github.com/elianedb.

[245]  arXiv:2112.00702 [pdf, other]
Title: Semi-supervised music emotion recognition using noisy student training and harmonic pitch class profiles
Authors: Hao Hao Tan
Comments: MediaEval 2021 submission for Emotion and Themes in Music
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

We present Mirable's submission to the 2021 Emotions and Themes in Music challenge. In this work, we intend to address the question: can we leverage semi-supervised learning techniques on music emotion recognition? With that, we experiment with noisy student training, which has improved model performance in the image classification domain. As the noisy student method requires a strong teacher model, we further delve into the factors including (i) input training length and (ii) complementary music representations to further boost the performance of the teacher model. For (i), we find that models trained with short input length perform better in PR-AUC, whereas those trained with long input length perform better in ROC-AUC. For (ii), we find that using harmonic pitch class profiles (HPCP) consistently improve tagging performance, which suggests that harmonic representation is useful for music emotion tagging. Finally, we find that noisy student method only improves tagging results for the case of long training length. Additionally, we find that ensembling representations trained with different training lengths can improve tagging results significantly, which suggest a possible direction to explore incorporating multiple temporal resolutions in the network architecture for future work.

[246]  arXiv:2112.00706 [pdf, ps, other]
Title: Clustering Mixtures with Almost Optimal Separation in Polynomial Time
Authors: Jerry Li, Allen Liu
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

We consider the problem of clustering mixtures of mean-separated Gaussians in high dimensions. We are given samples from a mixture of $k$ identity covariance Gaussians, so that the minimum pairwise distance between any two pairs of means is at least $\Delta$, for some parameter $\Delta > 0$, and the goal is to recover the ground truth clustering of these samples. It is folklore that separation $\Delta = \Theta (\sqrt{\log k})$ is both necessary and sufficient to recover a good clustering, at least information theoretically. However, the estimators which achieve this guarantee are inefficient. We give the first algorithm which runs in polynomial time, and which almost matches this guarantee. More precisely, we give an algorithm which takes polynomially many samples and time, and which can successfully recover a good clustering, so long as the separation is $\Delta = \Omega (\log^{1/2 + c} k)$, for any $c > 0$. Previously, polynomial time algorithms were only known for this problem when the separation was polynomial in $k$, and all algorithms which could tolerate $\textsf{poly}( \log k )$ separation required quasipolynomial time. We also extend our result to mixtures of translations of a distribution which satisfies the Poincar\'{e} inequality, under additional mild assumptions. Our main technical tool, which we believe is of independent interest, is a novel way to implicitly represent and estimate high degree moments of a distribution, which allows us to extract important information about high-degree moments without ever writing down the full moment tensors explicitly.

[247]  arXiv:2112.00708 [pdf, other]
Title: Optimal Resource Scheduling and Allocation in Distributed Computing Systems
Comments: This work has been submitted to ACC2022
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The essence of distributed computing systems is how to schedule incoming requests and how to allocate all computing nodes to minimize both time and computation costs. In this paper, we propose a cost-aware optimal scheduling and allocation strategy for distributed computing systems while minimizing the cost function including response time and service cost. First, based on the proposed cost function, we derive the optimal request scheduling policy and the optimal resource allocation policy synchronously. Second, considering the effects of incoming requests on the scheduling policy, the additive increase multiplicative decrease (AIMD) mechanism is implemented to model the relation between the request arrival and scheduling. In particular, the AIMD parameters can be designed such that the derived optimal strategy is still valid. Finally, a numerical example is presented to illustrate the derived results.

[248]  arXiv:2112.00709 [pdf, ps, other]
Title: GPU-Accelerated Forward-Backward algorithm with Application to Lattice-Free MMI
Comments: Submitted to ICASSP 2022
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computation and Language (cs.CL)

We propose to express the forward-backward algorithm in terms of operations between sparse matrices in a specific semiring. This new perspective naturally leads to a GPU-friendly algorithm which is easy to implement in Julia or any programming languages with native support of semiring algebra. We use this new implementation to train a TDNN with the LF-MMI objective function and we compare the training time of our system with PyChain - a recently introduced C++/CUDA implementation of the LF-MMI loss. Our implementation is about two times faster while not having to use any approximation such as the "leaky-HMM".

[249]  arXiv:2112.00710 [pdf, other]
Title: Stateful Entities: Object-oriented Cloud Applications as Distributed Dataflows
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

Programming stateful cloud applications remains a very painful experience. Instead of focusing on the business logic, programmers spend most of their time dealing with distributed systems considerations, with the most important being consistency, load balancing, failure management, recovery, and scalability. At the same time, we witness an unprecedented adoption of modern dataflow systems such as Apache Flink, Google Dataflow, and Timely Dataflow. These systems are now performant and fault-tolerant, and they offer excellent state management primitives.
With this line of work, we aim at investigating the opportunities and limits of compiling general-purpose programs into stateful dataflows. Given a set of easy-to-follow code conventions, programmers can author stateful entities, a programming abstraction embedded in Python. We present a compiler pipeline named StateFlow, to analyze the abstract syntax tree of a Python application and rewrite it into an intermediate representation based on stateful dataflow graphs. StateFlow compiles that intermediate representation to a target execution system: Apache Flink and Beam, AWS Lambda, Flink's Statefun, and Cloudburst. Through an experimental evaluation, we demonstrate that the code generated by StateFlow incurs minimal overhead. While developing and deploying our prototype, we came to observe important limitations of current dataflow systems in executing cloud applications at scale.

[250]  arXiv:2112.00712 [pdf, other]
Title: STEM: Unsupervised STructural EMbedding for Stance Detection
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

Stance detection is an important task, supporting many downstream tasks such as discourse parsing and modeling the propagation of fake news, rumors, and science denial. In this paper, we propose a novel framework for stance detection. Our framework is unsupervised and domain-independent. Given a claim and a multi-participant discussion - we construct the interaction network from which we derive topological embeddings for each speaker. These speaker embeddings enjoy the following property: speakers with the same stance tend to be represented by similar vectors, while antipodal vectors represent speakers with opposing stances. These embeddings are then used to divide the speakers into stance-partitions. We evaluate our method on three different datasets from different platforms. Our method outperforms or is comparable with supervised models while providing confidence levels for its output. Furthermore, we demonstrate how the structural embeddings relate to the valence expressed by the speakers. Finally, we discuss some limitations inherent to the framework.

[251]  arXiv:2112.00713 [pdf, other]
Title: hIPPYlib-MUQ: A Bayesian Inference Software Framework for Integration of Data with Complex Predictive Models under Uncertainty
Subjects: Numerical Analysis (math.NA); Mathematical Software (cs.MS); Optimization and Control (math.OC); Computation (stat.CO)

Bayesian inference provides a systematic means of quantifying uncertainty in the solution of the inverse problem. However, solution of Bayesian inverse problems governed by complex forward models described by partial differential equations (PDEs) remains prohibitive with black-box Markov chain Monte Carlo (MCMC) methods. We present hIPPYlib-MUQ, an extensible and scalable software framework that contains implementations of state-of-the art algorithms aimed to overcome the challenges of high-dimensional, PDE-constrained Bayesian inverse problems. hIPPYlib-MUQ integrates two complementary open-source software packages. hIPPYlib solves PDE-constrained inverse problems using automatically-generated adjoint-based derivatives, but it lacks full Bayesian capabilities. MUQ provides numerous powerful Bayesian inversion algorithms, but expects forward models to come equipped with derivatives to permit large-scale solution. By combining these two libraries, we created a robust, scalable, and efficient software framework that can be used to tackle complex large-scale Bayesian inverse problems across a broad spectrum of scientific and engineering disciplines. To illustrate the capabilities of hIPPYlib-MUQ, we compare a number of MCMC methods on several high-dimensional Bayesian inverse problems. The results demonstrate that large ($\sim 50\times$) speedups over conventional black box and gradient-based MCMC algorithms can be obtained by exploiting Hessian information (from the log-posterior), underscoring the power of the integrated hIPPYlib-MUQ framework.

[252]  arXiv:2112.00718 [pdf, other]
Title: Improving GAN Equilibrium by Raising Spatial Awareness
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The success of Generative Adversarial Networks (GANs) is largely built upon the adversarial training between a generator (G) and a discriminator (D). They are expected to reach a certain equilibrium where D cannot distinguish the generated images from the real ones. However, in practice it is difficult to achieve such an equilibrium in GAN training, instead, D almost always surpasses G. We attribute this phenomenon to the information asymmetry between D and G. Specifically, we observe that D learns its own visual attention when determining whether an image is real or fake, but G has no explicit clue on which regions to focus on for a particular synthesis. To alleviate the issue of D dominating the competition in GANs, we aim to raise the spatial awareness of G. Randomly sampled multi-level heatmaps are encoded into the intermediate layers of G as an inductive bias. Thus G can purposefully improve the synthesis of certain image regions. We further propose to align the spatial awareness of G with the attention map induced from D. Through this way we effectively lessen the information gap between D and G. Extensive results show that our method pushes the two-player game in GANs closer to the equilibrium, leading to a better synthesis performance. As a byproduct, the introduced spatial awareness facilitates interactive editing over the output synthesis. Demo video and more results are at https://genforce.github.io/eqgan/.

[253]  arXiv:2112.00719 [pdf, other]
Title: HyperInverter: Improving StyleGAN Inversion via Hypernetwork
Comments: 26 pages, 29 figures, project page is located at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Real-world image manipulation has achieved fantastic progress in recent years as a result of the exploration and utilization of GAN latent spaces. GAN inversion is the first step in this pipeline, which aims to map the real image to the latent code faithfully. Unfortunately, the majority of existing GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference. We present a novel two-phase strategy in this research that fits all requirements at the same time. In the first phase, we train an encoder to map the input image to StyleGAN2 $\mathcal{W}$-space, which was proven to have excellent editability but lower reconstruction quality. In the second phase, we supplement the reconstruction ability in the initial phase by leveraging a series of hypernetworks to recover the missing information during inversion. These two steps complement each other to yield high reconstruction quality thanks to the hypernetwork branch and excellent editability due to the inversion done in the $\mathcal{W}$-space. Our method is entirely encoder-based, resulting in extremely fast inference. Extensive experiments on two challenging datasets demonstrate the superiority of our method.

[254]  arXiv:2112.00722 [pdf, ps, other]
Title: Faster Maxflow via Improved Dynamic Spectral Vertex Sparsifiers
Comments: 63 pages
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)

We make several advances broadly related to the maintenance of electrical flows in weighted graphs undergoing dynamic resistance updates, including:
1. More efficient dynamic spectral vertex sparsification, achieved by faster length estimation of random walks in weighted graphs using Morris counters [Morris 1978, Nelson-Yu 2020].
2. A direct reduction from detecting edges with large energy in dynamic electric flows to dynamic spectral vertex sparsifiers.
3. A procedure for turning algorithms for estimating a sequence of vectors under updates from an oblivious adversary to one that tolerates adaptive adversaries via the Gaussian-mechanism from differential privacy.
Combining these pieces with modifications to prior robust interior point frameworks gives an algorithm that on graphs with $m$ edges computes a mincost flow with edge costs and capacities in $[1, U]$ in time $\widetilde{O}(m^{3/2-1/58} \log^2 U)$. In prior and independent work, [Axiotis-M\k{a}dry-Vladu FOCS 2021] also obtained an improved algorithm for sparse mincost flows on capacitated graphs. Our algorithm implies a $\widetilde{O}(m^{3/2-1/58} \log U)$ time maxflow algorithm, improving over the $\widetilde{O}(m^{3/2-1/328}\log U)$ time maxflow algorithm of [Gao-Liu-Peng FOCS 2021].

[255]  arXiv:2112.00724 [pdf, other]
Title: RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Comments: Project page available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)

Neural Radiance Fields (NeRF) have emerged as a powerful representation for the task of novel view synthesis due to their simplicity and state-of-the-art performance. Though NeRF can produce photorealistic renderings of unseen viewpoints when many input views are available, its performance drops significantly when this number is reduced. We observe that the majority of artifacts in sparse input scenarios are caused by errors in the estimated scene geometry, and by divergent behavior at the start of training. We address this by regularizing the geometry and appearance of patches rendered from unobserved viewpoints, and annealing the ray sampling space during training. We additionally use a normalizing flow model to regularize the color of unobserved viewpoints. Our model outperforms not only other methods that optimize over a single scene, but in many cases also conditional models that are extensively pre-trained on large multi-view datasets.

[256]  arXiv:2112.00725 [pdf, other]
Title: Extrapolating from a Single Image to a Thousand Classes using Distillation
Comments: Webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

What can neural networks learn about the visual world from a single image? While it obviously cannot contain the multitudes of possible objects, scenes and lighting conditions that exist - within the space of all possible 256^(3x224x224) 224-sized square images, it might still provide a strong prior for natural images. To analyze this hypothesis, we develop a framework for training neural networks from scratch using a single image by means of knowledge distillation from a supervisedly pretrained teacher. With this, we find that the answer to the above question is: 'surprisingly, a lot'. In quantitative terms, we find top-1 accuracies of 94%/74% on CIFAR-10/100, 59% on ImageNet and, by extending this method to audio, 84% on SpeechCommands. In extensive analyses we disentangle the effect of augmentations, choice of source image and network architectures and also discover "panda neurons" in networks that have never seen a panda. This work shows that one image can be used to extrapolate to thousands of object classes and motivates a renewed research agenda on the fundamental interplay of augmentations and image.

[257]  arXiv:2112.00726 [pdf, other]
Title: MonoScene: Monocular 3D Semantic Scene Completion
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2D-3D features projection inspiring from optics and introduces a 3D context relation prior to enforce spatio-semantic consistency. Along with architectural contributions, we introduce novel global scene and local frustums losses. Experiments show we outperform the literature on all metrics and datasets while hallucinating plausible scenery even beyond the camera field of view. Our code and trained models are available at https://github.com/cv-rits/MonoScene

Cross-lists for Thu, 2 Dec 21

[258]  arXiv:2111.13051 (cross-list from physics.soc-ph) [pdf]
Title: Ranking by Momentum based on Pareto ordering of entities
Comments: 12 pages, 12 figures
Subjects: Physics and Society (physics.soc-ph); Computers and Society (cs.CY); Databases (cs.DB); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)

Given a set of changing entities, which ones are the most uptrending over some time T? Which entities are standing out as the biggest movers?
To answer this question we define the concept of momentum. Two parameters - absolute gain and relative gain over time T play the key role in defining momentum. Neither alone is sufficient since they are each biased towards a subset of entities. Absolute gain favors large entities, while relative gain favors small ones. To accommodate both absolute and relative gain in an unbiased way, we define Pareto ordering between entities. For entity E to dominate another entity F in Pareto ordering, E's absolute and relative gains over time T must be higher than F's absolute and relative gains respectively. Momentum leaders are defined as maximal elements of this partial order - the Pareto frontier. We show how to compute momentum leaders and propose linear ordering among them to help rank entities with the most momentum on the top. Additionally, we show that when vectors follow power-law, the cardinality of the set of Momentum leaders (Pareto frontier) is of the order of square root of the logarithm of the number of entities, thus it is very small.

[259]  arXiv:2112.00002 (cross-list from eess.IV) [pdf, other]
Title: Zero-Shot Learning of Continuous 3D Refractive Index Maps from Discrete Intensity-Only Measurements
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Intensity diffraction tomography (IDT) refers to a class of optical microscopy techniques for imaging the 3D refractive index (RI) distribution of a sample from a set of 2D intensity-only measurements. The reconstruction of artifact-free RI maps is a fundamental challenge in IDT due to the loss of phase information and the missing cone problem. Neural fields (NF) has recently emerged as a new deep learning (DL) paradigm for learning continuous representations of complex 3D scenes without external training datasets. We present DeCAF as the first NF-based IDT method that can learn a high-quality continuous representation of a RI volume directly from its intensity-only and limited-angle measurements. We show on three different IDT modalities and multiple biological samples that DeCAF can generate high-contrast and artifact-free RI maps.

[260]  arXiv:2112.00016 (cross-list from hep-th) [pdf, other]
Title: Learning knot invariants across dimensions
Comments: 35 pages, 6 figures
Subjects: High Energy Physics - Theory (hep-th); Machine Learning (cs.LG); Geometric Topology (math.GT)

We use deep neural networks to machine learn correlations between knot invariants in various dimensions. The three-dimensional invariant of interest is the Jones polynomial $J(q)$, and the four-dimensional invariants are the Khovanov polynomial $\text{Kh}(q,t)$, smooth slice genus $g$, and Rasmussen's $s$-invariant. We find that a two-layer feed-forward neural network can predict $s$ from $\text{Kh}(q,-q^{-4})$ with greater than $99\%$ accuracy. A theoretical explanation for this performance exists in knot theory via the now disproven knight move conjecture, which is obeyed by all knots in our dataset. More surprisingly, we find similar performance for the prediction of $s$ from $\text{Kh}(q,-q^{-2})$, which suggests a novel relationship between the Khovanov and Lee homology theories of a knot. The network predicts $g$ from $\text{Kh}(q,t)$ with similarly high accuracy, and we discuss the extent to which the machine is learning $s$ as opposed to $g$, since there is a general inequality $|s| \leq 2g$. The Jones polynomial, as a three-dimensional invariant, is not obviously related to $s$ or $g$, but the network achieves greater than $95\%$ accuracy in predicting either from $J(q)$. Moreover, similar accuracy can be achieved by evaluating $J(q)$ at roots of unity. This suggests a relationship with $SU(2)$ Chern--Simons theory, and we review the gauge theory construction of Khovanov homology which may be relevant for explaining the network's performance.

[261]  arXiv:2112.00045 (cross-list from quant-ph) [pdf, other]
Title: Limiting the Search Space in Optimal Quantum Circuit Mapping
Comments: 7 pages, 5 figures, to be published at Asia and South Pacific Design Automation Conference (ASP-DAC), 2022
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

Executing quantum circuits on currently available quantum computers requires compiling them to a representation that conforms to all restrictions imposed by the targeted architecture. Due to the limited connectivity of the devices' physical qubits, an important step in the compilation process is to map the circuit in such a way that all its gates are executable on the hardware. Existing solutions delivering optimal solutions to this task are severely challenged by the exponential complexity of the problem. In this paper, we show that the search space of the mapping problem can be limited drastically while still preserving optimality. The proposed strategies are generic, architecture-independent, and can be adapted to various mapping methodologies. The findings are backed by both, theoretical considerations and experimental evaluations. Results confirm that, by limiting the search space, optimal solutions can be determined for instances that timeouted before or speed-ups of up to three orders of magnitude can be achieved.

[262]  arXiv:2112.00152 (cross-list from math.PR) [pdf, ps, other]
Title: One-step replica symmetry breaking of random regular NAE-SAT II
Comments: This is Part II of a two-paper series
Subjects: Probability (math.PR); Discrete Mathematics (cs.DM); Mathematical Physics (math-ph)

Continuing our earlier work in \cite{nss20a}, we study the random regular k-NAE-SAT model in the condensation regime. In \cite{nss20a}, the 1RSB properties of the model were established with positive probability. In this paper, we improve the result to probability arbitrarily close to one. To do so, we introduce a new framework which is the synthesis of two approaches: the small subgraph conditioning and a variance decomposition technique using Doob martingales and discrete Fourier analysis. The main challenge is a delicate integration of the two methods to overcome the difficulty arising from applying the moment method to an unbounded state space.

[263]  arXiv:2112.00183 (cross-list from physics.soc-ph) [pdf, other]
Title: Descriptive vs. inferential community detection: pitfalls, myths and half-truths
Authors: Tiago P. Peixoto
Comments: 51 pages, 16 figures
Subjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME); Machine Learning (stat.ML)

Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is considered the state-of-the-art and the methods that are actually used in practice in a variety of fields. Here we attempt to address this discrepancy by dividing existing methods according to whether they have a "descriptive" or an "inferential" goal. While descriptive methods find patterns in networks based on intuitive notions of community structure, inferential methods articulate a precise generative model, and attempt to fit it to data. In this way, they are able to provide insights into the mechanisms of network formation, and separate structure from randomness in a manner supported by statistical evidence. We review how employing descriptive methods with inferential aims is riddled with pitfalls and misleading answers, and thus should be in general avoided. We argue that inferential methods are more typically aligned with clearer scientific questions, yield more robust results, and should be in general preferred. We attempt to dispel some myths and half-truths often believed when community detection is employed in practice, in an effort to improve both the use of such methods as well as the interpretation of their results.

[264]  arXiv:2112.00187 (cross-list from quant-ph) [pdf, other]
Title: Quantum Compiling
Comments: 37 pages, 8 figures
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

Quantum compiling fills the gap between the computing layer of high-level quantum algorithms and the layer of physical qubits with their specific properties and constraints. Quantum compiling is a hybrid between the general-purpose compilers of computers, transforming high-level language to assembly language and hardware synthesis by hardware description language, where functions are automatically synthesized into customized hardware. Here we review the quantum compiling stack of both gate model quantum computers and the adiabatic quantum computers, respectively. The former involves low level qubit control, quantum error correction, synthesis of short quantum circuits, transpiling, while the latter involves the virtualization of qubits by embedding of QUBO and HUBO problems on constrained graphs of physical qubits and both quantum error suppression and correction. Commercial initiatives and quantum compiling products are reviewed, including explicit programming examples.

[265]  arXiv:2112.00222 (cross-list from stat.ML) [pdf, other]
Title: Convergence of GANs Training: A Game and Stochastic Control Methodology
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Training of generative adversarial networks (GANs) is known for its difficulty to converge. This paper first confirms analytically one of the culprits behind this convergence issue: the lack of convexity in GANs objective functions, hence the well-posedness problem of GANs models. Then, it proposes a stochastic control approach for hyper-parameters tuning in GANs training. In particular, it presents an optimal solution for adaptive learning rate which depends on the convexity of the objective function, and builds a precise relation between improper choices of learning rate and explosion in GANs training. Finally, empirical studies demonstrate that training algorithms incorporating this selection methodology outperform standard ones.

[266]  arXiv:2112.00257 (cross-list from math.HO) [pdf, other]
Title: Harmonic numbers as the summation of integrals
Authors: N. Karjanto
Comments: 4 pages, 16 references
Subjects: History and Overview (math.HO); General Literature (cs.GL)

Harmonic numbers arise from the truncation of the harmonic series. The $n^\text{th}$ harmonic number is the sum of the reciprocals of each positive integer up to $n$. In addition to briefly introducing the properties of harmonic numbers, we cover harmonic numbers as the summation of integrals that involve the product of exponential and hyperbolic secant functions. The proof is relatively simple since it only comprises the Principle of Mathematical Induction and integration by parts.

[267]  arXiv:2112.00274 (cross-list from math.OC) [pdf, ps, other]
Title: Distributed Forward-Backward Methods without Central Coordination
Comments: 17 pages
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC)

In this work, we propose and analyse forward-backward-type algorithms for finding a zero in the sum of finitely many monotone operators, which are not based on reduction to a two operator inclusion in the product space. Each iteration of the studied algorithms requires one resolvent evaluation per set-valued operator, one forward evaluation per cocoercive operator, and two forward evaluations per monotone operator. Unlike existing methods, the structure of the proposed algorithms are suitable for distributed, decentralised implementation in ring networks without the need for a central coordinator to enforce consensus between nodes.

[268]  arXiv:2112.00307 (cross-list from math.CO) [pdf, ps, other]
Title: A note on simple games with two equivalence classes of players
Comments: 13 pages
Subjects: Combinatorics (math.CO); Computer Science and Game Theory (cs.GT)

Many real-world voting systems consist of voters that occur in just two different types. Indeed, each voting system with a {\lq\lq}House{\rq\rq} and a {\lq\lq}Senat{\rq\rq} is of that type. Here we present structural characterizations and explicit enumeration formulas for these so-called bipartite simple games.

[269]  arXiv:2112.00313 (cross-list from quant-ph) [pdf, other]
Title: Discriminating Quantum States with Quantum Machine Learning
Journal-ref: 2021 International Conference on Rebooting Computing (ICRC) (2021) 56-63
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

Quantum machine learning (QML) algorithms have obtained great relevance in the machine learning (ML) field due to the promise of quantum speedups when performing basic linear algebra subroutines (BLAS), a fundamental element in most ML algorithms. By making use of BLAS operations, we propose, implement and analyze a quantum k-means (qk-means) algorithm with a low time complexity of $\mathcal{O}(NKlog(D)I/C)$ to apply it to the fundamental problem of discriminating quantum states at readout. Discriminating quantum states allows the identification of quantum states $|0\rangle$ and $|1\rangle$ from low-level in-phase and quadrature signal (IQ) data, and can be done using custom ML models. In order to reduce dependency on a classical computer, we use the qk-means to perform state discrimination on the IBMQ Bogota device and managed to find assignment fidelities of up to 98.7% that were only marginally lower than that of the k-means algorithm. Inspection of assignment fidelity scores resulting from applying both algorithms to a combination of quantum states showed concordance to our correlation analysis using Pearson Correlation coefficients, where evidence shows cross-talk in the (1, 2) and (2, 3) neighboring qubit couples for the analyzed device.

[270]  arXiv:2112.00314 (cross-list from stat.ML) [pdf, other]
Title: Asymmetric error control under imperfect supervision: a label-noise-adjusted Neyman-Pearson umbrella algorithm
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Label noise in data has long been an important problem in supervised learning applications as it affects the effectiveness of many widely used classification methods. Recently, important real-world applications, such as medical diagnosis and cybersecurity, have generated renewed interest in the Neyman-Pearson (NP) classification paradigm, which constrains the more severe type of error (e.g., the type I error) under a preferred level while minimizing the other (e.g., the type II error). However, there has been little research on the NP paradigm under label noise. It is somewhat surprising that even when common NP classifiers ignore the label noise in the training stage, they are still able to control the type I error with high probability. However, the price they pay is excessive conservativeness of the type I error and hence a significant drop in power (i.e., $1 - $ type II error). Assuming that domain experts provide lower bounds on the corruption severity, we propose the first theory-backed algorithm that adapts most state-of-the-art classification methods to the training label noise under the NP paradigm. The resulting classifiers not only control the type I error with high probability under the desired level but also improve power.

[271]  arXiv:2112.00344 (cross-list from q-bio.QM) [pdf, other]
Title: Leveraging Sequence Embedding and Convolutional Neural Network for Protein Function Prediction
Comments: Published in NeurIPS 2018 Machine Learning for Molecules and Materials Workshop
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Biomolecules (q-bio.BM)

The capability of accurate prediction of protein functions and properties is essential in the biotechnology industry, e.g. drug development and artificial protein synthesis, etc. The main challenges of protein function prediction are the large label space and the lack of labeled training data. Our method leverages unsupervised sequence embedding and the success of deep convolutional neural network to overcome these challenges. In contrast, most of the existing methods delete the rare protein functions to reduce the label space. Furthermore, some existing methods require additional bio-information (e.g., the 3-dimensional structure of the proteins) which is difficult to be determined in biochemical experiments. Our proposed method significantly outperforms the other methods on the publicly available benchmark using only protein sequences as input. This allows the process of identifying protein functions to be sped up.

[272]  arXiv:2112.00365 (cross-list from stat.ML) [pdf, other]
Title: Mixed neural network Gaussian processes
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

This paper makes two contributions. Firstly, it introduces mixed compositional kernels and mixed neural network Gaussian processes (NGGPs). Mixed compositional kernels are generated by composition of probability generating functions (PGFs). A mixed NNGP is a Gaussian process (GP) with a mixed compositional kernel, arising in the infinite-width limit of multilayer perceptrons (MLPs) that have a different activation function for each layer. Secondly, $\theta$ activation functions for neural networks and $\theta$ compositional kernels are introduced by building upon the theory of branching processes, and more specifically upon $\theta$ PGFs. While $\theta$ compositional kernels are recursive, they are expressed in closed form. It is shown that $\theta$ compositional kernels have non-degenerate asymptotic properties under certain conditions. Thus, GPs with $\theta$ compositional kernels do not require non-explicit recursive kernel evaluations and have controllable infinite-depth asymptotic properties. An open research question is whether GPs with $\theta$ compositional kernels are limits of infinitely-wide MLPs with $\theta$ activation functions.

[273]  arXiv:2112.00391 (cross-list from math.RA) [pdf, other]
Title: Non-Sturmian sequences of matrices providing the maximum growth rate of matrix products
Authors: Victor Kozyakin
Comments: 27 pages, 11 figures, 50 bibliography references, 1 Python program listing
Subjects: Rings and Algebras (math.RA); Numerical Analysis (math.NA)

One of the most pressing problems in modern analysis is the study of the growth rate of the norms of all possible matrix products $\|A_{i_{n}}\cdots A_{i_{0}}\|$ with factors from a set of matrices $\mathscr{A}$. So far, only for a relatively small number of classes of matrices $\mathscr{A}$ has it been possible to rigorously describe the sequences of matrices $\{A_{i_{n}}\}$ that guarantee the maximal growth rate of the corresponding norms. Moreover, in almost all theoretically studied cases, the index sequences $\{i_{n}\}$ of matrices maximizing the norms of the corresponding matrix products turned out to be periodic or so-called Sturmian sequences, which entails a whole set of ``good'' properties of the sequences $\{A_{i_{n}}\}$, in particular the existence of a limiting frequency of occurrence of each matrix factor $A_{i}\in\mathscr{A}$ in them. The paper determines a class of $2\times 2$ matrices consisting of two matrices similar to rotations of the plane in which the sequence $\{A_{i_{n}}\}$ maximizing the growth rate of the norms $\|A_{i_{n}}\cdots A_{i_{0}}\|$ is not Sturmian. All considerations are based on numerical modeling and cannot be considered mathematically rigorous in this part. Rather, they should be interpreted as a set of questions for further comprehensive theoretical analysis.

[274]  arXiv:2112.00423 (cross-list from stat.ML) [pdf, other]
Title: Controlling Wasserstein distances by Kernel norms with application to Compressive Statistical Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Comparing probability distributions is at the crux of many machine learning algorithms. Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between probability measures that have attracted abundant attention in past years. This paper establishes some conditions under which the Wasserstein distance can be controlled by MMD norms. Our work is motivated by the compressive statistical learning (CSL) theory, a general framework for resource-efficient large scale learning in which the training data is summarized in a single vector (called sketch) that captures the information relevant to the considered learning task. Inspired by existing results in CSL, we introduce the H\"older Lower Restricted Isometric Property (H\"older LRIP) and show that this property comes with interesting guarantees for compressive statistical learning. Based on the relations between the MMD and the Wasserstein distance, we provide guarantees for compressive statistical learning by introducing and studying the concept of Wasserstein learnability of the learning task, that is when some task-specific metric between probability distributions can be bounded by a Wasserstein distance.

[275]  arXiv:2112.00456 (cross-list from math.HO) [pdf, ps, other]
Title: Lecture notes on complexity of quantifier elimination over the reals
Authors: Nicolai Vorobjov
Comments: 21 pages
Subjects: History and Overview (math.HO); Symbolic Computation (cs.SC); Logic (math.LO)

These are lecture notes for a course I gave in mid-1990s for MSc students at the University of Bath. It presents an algorithm with singly exponential complexity for the existential theory of the reals, in the spirit of J. Renegar. The aim was to convey the main underlying ideas, so many of the proofs and finer details of algorithms are either missing or just sketched. I changed nothing in the original notes except adding references, bibliography, and correcting obvious typos.

[276]  arXiv:2112.00460 (cross-list from hep-lat) [pdf, other]
Title: Machine learning Hadron Spectral Functions in Lattice QCD
Comments: 9 pages, 7 figures. Talk presented at the 38th International Symposium on Lattice Field Theory (Lattice 2021), 26-30 July, 2021, Zoom/Gather@Massachusetts Institute of Technology
Subjects: High Energy Physics - Lattice (hep-lat); Machine Learning (cs.LG); High Energy Physics - Phenomenology (hep-ph); High Energy Physics - Theory (hep-th); Nuclear Theory (nucl-th)

Hadron spectral functions carry all the information of hadrons and are encoded in the Euclidean two-point correlation functions. The extraction of hadron spectral functions from the correlator is a typical ill-posed inverse problem and infinite number of solutions to this problem exists. We propose a novel neural network (sVAE) based on the Variation Auto-Encoder (VAE) and Bayesian theorem. Inspired by the maximum entropy method (MEM) we construct the loss function of the neural work such that it includes a Shannon-Jaynes entropy term and a likelihood term. The sVAE is then trained to provide the most probable spectral functions. For the training samples of spectral function we used general spectral functions produced from the Gaussian Mixture Model. After the training is done we performed the mock data tests with input spectral functions consisting 1) only a free continuum, 2) only a resonance peak, 3) a resonance peak plus a free continuum and 4) a NRQCD motivated spectral function. From the mock data test we find that the sVAE in most cases is comparable to the maximum entropy method in the quality of reconstructing spectral functions and even outperforms the MEM in the case where the spectral function has sharp peaks with insufficient number of data points in the correlator. By applying to temporal correlation functions of charmonium in the pseudoscalar channel obtained in the quenched lattice QCD at 0.75 $T_c$ on $128^3\times96$ lattices and $1.5$ $T_c$ on $128^3\times48$ lattices, we find that the resonance peak of $\eta_c$ extracted from both the sVAE and MEM has a substantial dependence on the number of points in the temporal direction ($N_\tau$) adopted in the lattice simulation and $N_\tau$ larger than 48 is needed to resolve the fate of $\eta_c$ at 1.5 $T_c$.

[277]  arXiv:2112.00539 (cross-list from math.LO) [pdf, other]
Title: Finitary type theories with and without contexts
Subjects: Logic (math.LO); Logic in Computer Science (cs.LO)

We give a definition of finitary type theories that subsumes many examples of dependent type theories, such as variants of Martin-L\"of type theory, simple type theories, first-order and higher-order logics, and homotopy type theory. We prove several general meta-theorems about finitary type theories: weakening, admissibility of substitution and instantiation of metavariables, derivability of presuppositions, uniqueness of typing, and inversion principles.
We then give a second formulation of finitary type theories in which there are no explicit contexts. Instead, free variables are explicitly annotated with their types. We provide translations between finitary type theories with and without contexts, thereby showing that they have the same expressive power. The context-free type theory is implemented in the nucleus of the Andromeda 2 proof assistant.

[278]  arXiv:2112.00543 (cross-list from quant-ph) [pdf, other]
Title: (Causal)-Activation of Complex Entanglement Structures in Quantum Networks
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Entanglement represents "the" key resource for several applications of quantum information processing, ranging from quantum communications to distributed quantum computing. Despite its fundamental importance, deterministic generation of maximally entangled qubits represents an on-going open problem. Here, we design a novel generation scheme exhibiting two attractive features, namely, i) deterministically generating genuinely multipartite entangled states, ii) without requiring any direct interaction between the qubits. Indeed, the only necessary condition is the possibility of coherently controlling -- according to the indefinite causal order framework -- the causal order among some unitaries acting on the qubits. Through the paper, we analyze and derive the conditions on the unitaries for deterministic generation, and we provide examples for unitaries practical implementation. We conclude the paper by discussing the scalability of the proposed scheme to higher dimensional GME states and by introducing some possible applications of the proposal for quantum networks.

[279]  arXiv:2112.00565 (cross-list from stat.ML) [pdf, other]
Title: On Mixing Times of Metropolized Algorithm With Optimization Step (MAO) : A New Framework
Comments: 24 pages, 27 Figures, 4 Tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)

In this paper, we consider sampling from a class of distributions with thin tails supported on $\mathbb{R}^d$ and make two primary contributions. First, we propose a new Metropolized Algorithm With Optimization Step (MAO), which is well suited for such targets. Our algorithm is capable of sampling from distributions where the Metropolis-adjusted Langevin algorithm (MALA) is not converging or lacking in theoretical guarantees. Second, we derive upper bounds on the mixing time of MAO. Our results are supported by simulations on multiple target distributions.

[280]  arXiv:2112.00573 (cross-list from math.PR) [pdf, other]
Title: Uniqueness for the q-state antiferromagnetic Potts model on the regular tree
Comments: 16 pages
Subjects: Probability (math.PR); Discrete Mathematics (cs.DM); Mathematical Physics (math-ph); Combinatorics (math.CO)

We present an elementary proof for the uniqueness regime of the general $q$-state antiferromagnetic Potts model on the $d$-ary tree. The key observation is a positive association property of its boundary condition. We also obtain the exact exponential decay rate in all of the subcritical regime, and power law decay rate at the critical temperature.

[281]  arXiv:2112.00635 (cross-list from eess.AS) [pdf, other]
Title: Predicting lexical skills from oral reading with acoustic measures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)

Literacy assessment is an important activity for education administrators across the globe. Typically achieved in a school setting by testing a child's oral reading, it is intensive in human resources. While automatic speech recognition (ASR) is a potential solution to the problem, it tends to be computationally expensive for hand-held devices apart from needing language and accent-specific speech for training. In this work, we propose a system to predict the word-decoding skills of a student based on simple acoustic features derived from the recording. We first identify a meaningful categorization of word-decoding skills by analyzing a manually transcribed data set of children's oral reading recordings. Next the automatic prediction of the category is attempted with the proposed acoustic features. Pause statistics, syllable rate and spectral and intensity dynamics are found to be reliable indicators of specific types of oral reading deficits, providing useful feedback by discriminating the different characteristics of beginning readers. This computationally simple and language-agnostic approach is found to provide a performance close to that obtained using a language dependent ASR that required considerable tuning of its parameters.

[282]  arXiv:2112.00672 (cross-list from stat.ME) [pdf, other]
Title: Controlling for multiple covariates
Authors: Mark Tygert
Comments: 29 pages, 21 figures, 2 tables
Subjects: Methodology (stat.ME); Computers and Society (cs.CY); Computation (stat.CO)

A fundamental problem in statistics is to compare the outcomes attained by members of subpopulations. This problem arises in the analysis of randomized controlled trials, in the analysis of A/B tests, and in the assessment of fairness and bias in the treatment of sensitive subpopulations, especially when measuring the effects of algorithms and machine learning. Often the comparison makes the most sense when performed separately for individuals who are similar according to certain characteristics given by the values of covariates of interest; the separate comparisons can also be aggregated in various ways to compare across all values of the covariates. Separating, segmenting, or stratifying into those with similar values of the covariates is also known as "conditioning on" or "controlling for" those covariates; controlling for age or annual income is common.
Two standard methods of controlling for covariates are (1) binning and (2) regression modeling. Binning requires making fairly arbitrary, yet frequently highly influential choices, and is unsatisfactorily temperamental in multiple dimensions, with multiple covariates. Regression analysis works wonderfully when there is good reason to believe in a particular parameterized regression model or classifier (such as logistic regression). Thus, there appears to be no extant canonical fully non-parametric regression for the comparison of subpopulations, not while conditioning on multiple specified covariates. Existing methods rely on analysts to make choices, and those choices can be debatable; analysts can deceive others or even themselves. The present paper aims to fill the gap, combining two ingredients: (1) recently developed methodologies for such comparisons that already exist when conditioning on a single scalar covariate and (2) the Hilbert space-filling curve that maps continuously from one dimension to multiple dimensions.

[283]  arXiv:2112.00695 (cross-list from eess.SP) [pdf, other]
Title: DeepAoANet: Learning Angle of Arrival from Software Defined Radios with Deep Neural Networks
Comments: Angle-of-arrival estimation from Software Defined Radios, Benchmark and Baseline
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Robotics (cs.RO)

Direction finding and positioning systems based on RF signals are significantly impacted by multipath propagation, particularly in indoor environments. Existing algorithms (e.g MUSIC) perform poorly in resolving Angle of Arrival (AoA) in the presence of multipath or when operating in a weak signal regime. We note that digitally sampled RF frontends allow for the easy analysis of signals, and their delayed components. Low-cost Software-Defined Radio (SDR) modules enable Channel State Information (CSI) extraction across a wide spectrum, motivating the design of an enhanced Angle-of-Arrival (AoA) solution. We propose a Deep Learning approach to deriving AoA from a single snapshot of the SDR multichannel data. We compare and contrast deep-learning based angle classification and regression models, to estimate up to two AoAs accurately. We have implemented the inference engines on different platforms to extract AoAs in real-time, demonstrating the computational tractability of our approach. To demonstrate the utility of our approach we have collected IQ (In-phase and Quadrature components) samples from a four-element Universal Linear Array (ULA) in various Light-of-Sight (LOS) and Non-Line-of-Sight (NLOS) environments, and published the dataset. Our proposed method demonstrates excellent reliability in determining number of impinging signals and realized mean absolute AoA errors less than $2^{\circ}$.

[284]  arXiv:2112.00720 (cross-list from math.GT) [pdf, other]
Title: Quasi-universality of Reeb graph distances
Subjects: Geometric Topology (math.GT); Computational Geometry (cs.CG)

We establish bi-Lipschitz bounds certifying quasi-universality (universality up to a constant factor) for various distances between Reeb graphs: the interleaving distance, the functional distortion distance, and the functional contortion distance. The definition of the latter distance is a novel contribution, and for the special case of contour trees we also prove strict universality of this distance. Furthermore, we prove that for the special case of merge trees the functional contortion distance coincides with the interleaving distance, yielding universality of all four distances in this case.

[285]  arXiv:2112.00723 (cross-list from quant-ph) [pdf, other]
Title: Infinite Neural Network Quantum States
Subjects: Quantum Physics (quant-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); High Energy Physics - Theory (hep-th)

We study infinite limits of neural network quantum states ($\infty$-NNQS), which exhibit representation power through ensemble statistics, and also tractable gradient descent dynamics. Ensemble averages of Renyi entropies are expressed in terms of neural network correlators, and architectures that exhibit volume-law entanglement are presented. A general framework is developed for studying the gradient descent dynamics of neural network quantum states (NNQS), using a quantum state neural tangent kernel (QS-NTK). For $\infty$-NNQS the training dynamics is simplified, since the QS-NTK becomes deterministic and constant. An analytic solution is derived for quantum state supervised learning, which allows an $\infty$-NNQS to recover any target wavefunction. Numerical experiments on finite and infinite NNQS in the transverse field Ising model and Fermi Hubbard model demonstrate excellent agreement with theory. $\infty$-NNQS opens up new opportunities for studying entanglement and training dynamics in other physics applications, such as in finding ground states.

Replacements for Thu, 2 Dec 21

[286]  arXiv:1609.04382 (replaced) [pdf, other]
Title: Warped Convolutions: Efficient Invariance to Spatial Transformations
Comments: Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[287]  arXiv:1811.11858 (replaced) [pdf, other]
Title: Can you sign a quantum state?
Comments: 26+12 pages, v4: version for publication in Quantum
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
[288]  arXiv:1905.12707 (replaced) [pdf, other]
Title: Heterogeneous causal effects with imperfect compliance: a Bayesian machine learning approach
Comments: To appear in the Annals of Applied Statistics
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
[289]  arXiv:1906.01756 (replaced) [pdf, ps, other]
Title: Slack Channels Ecology in Enterprises: How Employees Collaborate Through Group Chat
Comments: Accepted at ACM CSCW'22
Journal-ref: ACM CSCW 2022
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[290]  arXiv:1906.01820 (replaced) [pdf, other]
Title: Risks from Learned Optimization in Advanced Machine Learning Systems
Subjects: Artificial Intelligence (cs.AI)
[291]  arXiv:1910.03332 (replaced) [pdf, ps, other]
Title: Upper and Lower Bounds for Fully Retroactive Graph Problems
Subjects: Data Structures and Algorithms (cs.DS)
[292]  arXiv:1910.05065 (replaced) [pdf, other]
Title: A Theory of Relation Learning and Cross-domain Generalization
Comments: Includes supplemental material
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[293]  arXiv:1910.08953 (replaced) [pdf, other]
Title: Overcoming Free-Riding in Bandit Games
Comments: 66 pages, 4 figures; minor corrections and updated references
Subjects: Theoretical Economics (econ.TH); Computer Science and Game Theory (cs.GT)
[294]  arXiv:1910.11656 (replaced) [pdf, other]
Title: Attend to the Difference: Cross-Modality Person Re-identification via Contrastive Correlation
Comments: The paper is accepted by TIP
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[295]  arXiv:1911.02883 (replaced) [pdf, other]
Title: Graph Domain Adaptation with Localized Graph Signal Representations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[296]  arXiv:1912.07362 (replaced) [pdf, other]
Title: Dynamic controller that operates over homomorphically encrypted data for infinite time horizon
Comments: 12 pages, 3 figures
Subjects: Systems and Control (eess.SY)
[297]  arXiv:1912.10036 (replaced) [pdf, other]
Title: A Family of Deep Learning Architectures for Channel Estimation and Hybrid Beamforming in Multi-Carrier mm-Wave Massive MIMO
Comments: Accepted Paper in IEEE Transactions on Cognitive Communications and Networking. arXiv admin note: text overlap with arXiv:1910.14240
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)
[298]  arXiv:1912.10769 (replaced) [pdf, ps, other]
Title: Online Throughput Maximization on Unrelated Machines: Commitment is No Burden
Subjects: Data Structures and Algorithms (cs.DS)
[299]  arXiv:2001.08510 (replaced) [pdf, other]
Title: Bibliography of distributed approximation beyond bounded degree
Comments: An annotated bibliography. Third version
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
[300]  arXiv:2002.00885 (replaced) [pdf, other]
Title: Diffusion bridges for stochastic Hamiltonian systems and shape evolutions
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)
[301]  arXiv:2003.01416 (replaced) [pdf, other]
Title: An Online Learning Framework for Energy-Efficient Navigation of Electric Vehicles
Comments: Accepted at IJCAI 2020 Main Track. Sole copyright holder is IJCAI (International Joint Conferences on Artificial Intelligence), all rights reserved. Available at this https URL
Journal-ref: IJCAI 2020, Pages 2051-2057
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[302]  arXiv:2003.10094 (replaced) [pdf, other]
Title: Penalized and Decentralized Contextual Bandit Learning for WLAN Channel Allocation with Contention-Driven Feature Extraction
Comments: 12 pages, 6 figures, 3 Tables
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[303]  arXiv:2003.13607 (replaced) [pdf, other]
Title: Non-asymptotic Superlinear Convergence of Standard Quasi-Newton Methods
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[304]  arXiv:2003.13896 (replaced) [pdf, other]
Title: Robust Multiple-Path Orienteering Problem: Securing Against Adversarial Attacks
Comments: submitted to TRO
Subjects: Robotics (cs.RO)
[305]  arXiv:2003.13966 (replaced) [pdf, other]
Title: Individual Fairness in Advertising Auctions through Inverse Proportionality
Comments: To appear at ITCS 2022; this is the full version
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
[306]  arXiv:2006.12242 (replaced) [pdf, other]
Title: Exploiting topology awareness for routing in LEO satellite constellations
Comments: Accepted for publication at IEEE GLOBECOM 2021
Subjects: Networking and Internet Architecture (cs.NI)
[307]  arXiv:2007.02931 (replaced) [pdf, other]
Title: Adaptive Risk Minimization: Learning to Adapt to Domain Shift
Comments: NeurIPS 2021 ; Project website: this https URL ; Code: this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[308]  arXiv:2007.14390 (replaced) [pdf, other]
Title: Flower: A Friendly Federated Learning Research Framework
Comments: Open-Source, mobile-friendly Federated Learning framework
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[309]  arXiv:2007.14823 (replaced) [pdf, other]
Title: Theory of gating in recurrent neural networks
Comments: 13 figures
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD); Neurons and Cognition (q-bio.NC)
[310]  arXiv:2008.00742 (replaced) [pdf, other]
Title: Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)
Comments: 34 pages, 1 figure
Journal-ref: NeurIPS 2021
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[311]  arXiv:2008.07527 (replaced) [pdf, other]
Title: Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features
Journal-ref: International Journal of Interactive Multimedia & Artificial Intelligence (2021), vol. 7, no 2, p. 78-88
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[312]  arXiv:2009.09899 (replaced) [pdf, other]
Title: Clustering COVID-19 Lung Scans
Comments: 11 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[313]  arXiv:2010.09453 (replaced) [pdf, other]
Title: Fast accuracy estimation of deep learning based multi-class musical source separation
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[314]  arXiv:2010.12751 (replaced) [pdf, other]
Title: Model Extraction Attacks on Graph Neural Networks: Taxonomy and Realization
Comments: This paper has been published in the 17th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2022)
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[315]  arXiv:2011.00740 (replaced) [pdf, other]
Title: Influence Patterns for Explaining Information Flow in BERT
Comments: Neurips 2021
Subjects: Computation and Language (cs.CL)
[316]  arXiv:2011.06782 (replaced) [pdf, other]
Title: A Nested Bi-level Optimization Framework for Robust Few Shot Learning
Comments: To appear in the proceedings of AAAI 2022
Subjects: Machine Learning (cs.LG)
[317]  arXiv:2011.12108 (replaced) [pdf, other]
Title: Wide-angle Image Rectification: A Survey
Comments: Accepted by the International Journal of Computer Vision (IJCV). Both the datasets and source code are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[318]  arXiv:2011.14303 (replaced) [pdf, other]
Title: A Probabilistic Higher-order Fixpoint Logic
Subjects: Logic in Computer Science (cs.LO)
[319]  arXiv:2011.14989 (replaced) [pdf, other]
Title: The $\aleph$ Calculus
Authors: Hannah Earley
Comments: 51 pages, 18 figures/listings; update references and acknowledgements
Subjects: Programming Languages (cs.PL)
[320]  arXiv:2011.15014 (replaced) [pdf, other]
Title: Learning from Human Directional Corrections
Comments: Please find the codes and games at this https URL
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
[321]  arXiv:2012.00118 (replaced) [pdf, other]
Title: A new operational representation of dependencies in Event Structures
Authors: G. Michele Pinna
Subjects: Logic in Computer Science (cs.LO)
[322]  arXiv:2012.00675 (replaced) [pdf, other]
Title: Topological Learning for Brain Networks
Subjects: Neurons and Cognition (q-bio.NC); Computational Geometry (cs.CG)
[323]  arXiv:2012.03646 (replaced) [pdf, other]
Title: A novel dataset for the identification of computer generated melodies in the CSMT challenge
Comments: Published by Conference on Sound and Music Technology
Journal-ref: In Proceedings of the 8th Conference on Sound and Music Technology. CSMT 2020. Lecture Notes in Electrical Engineering, vol 761. Springer
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[324]  arXiv:2012.04731 (replaced) [pdf, ps, other]
Title: Long Term Motion Prediction Using Keyposes
Comments: Code publicly available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[325]  arXiv:2012.09855 (replaced) [pdf, other]
Title: Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
Comments: ICCV 2021 (oral); Project page: this https URL; Video: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[326]  arXiv:2012.15254 (replaced) [pdf, other]
Title: Post-Quantum Blockchain Proofs of Work
Comments: 30 pages. (v3) changed the title and improved readability. This work supersedes the result of our previous work in eprint.iacr.org/2019/1150
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
[327]  arXiv:2101.10011 (replaced) [pdf, other]
Title: They See Me Rollin': Inherent Vulnerability of the Rolling Shutter in CMOS Image Sensors
Comments: 15 pages, 15 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
[328]  arXiv:2101.10166 (replaced) [pdf, other]
Title: A Machine-checked proof of Birkhoff's Variety Theorem in Martin-Löf Type Theory
Comments: This is the long (35 page) version of a paper submitted to TYPES 2021; the previous draft, [v2], was a comprehensive description of an old version of the Agda Universal Algebra Library (called UALib; ver. 1.0.0); the library was rewritten and renamed agda-algebras (ver. 2.0.0); this paper describes only a subset of the agda-algebras library that we used to prove Birkhoff's HSP theorem in Agda
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)
[329]  arXiv:2101.11091 (replaced) [pdf, ps, other]
Title: Nonconvex Regularized Gradient Projection Sparse Reconstruction for Massive MIMO Channel Estimation
Subjects: Information Theory (cs.IT)
[330]  arXiv:2101.11970 (replaced) [pdf, other]
Title: AHMoSe: A Knowledge-Based Visual Support System for Selecting Regression Machine Learning Models
Comments: 27 pages, 6 figures, 5 tables. Accepted manuscript version. Published in Computers and Electronics in Agriculture
Journal-ref: Computers and Electronics in Agriculture 187 (2021) 106183
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[331]  arXiv:2102.00405 (replaced) [pdf, other]
Title: BNLP: Natural language processing toolkit for Bengali language
Authors: Sagor Sarker
Comments: 5 pages, 4 figures
Subjects: Computation and Language (cs.CL)
[332]  arXiv:2102.00883 (replaced) [pdf, other]
Title: Stochastic High Fidelity Simulation and Scenarios for Testing of Fixed Wing Autonomous GNSS-Denied Navigation Algorithms
Authors: Eduardo Gallo
Comments: 25 pages, 17 figures
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[333]  arXiv:2102.01381 (replaced) [pdf, other]
Title: Generalized Facial Manipulation Detection with Edge Region Feature Extraction
Comments: Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision 2022 (WACV 2022)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[334]  arXiv:2102.04877 (replaced) [pdf, other]
Title: Noisy Recurrent Neural Networks
Comments: 38 pages
Journal-ref: NeurIPS 2021 (https://proceedings.neurips.cc/paper/2021/hash/29301521774ff3cbd26652b2d5c95996-Abstract.html)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Dynamical Systems (math.DS); Probability (math.PR)
[335]  arXiv:2102.06492 (replaced) [pdf, other]
Title: Customizable Stochastic High Fidelity Model of the Sensors and Camera onboard a Low SWaP Fixed Wing Autonomous Aircraft
Authors: Eduado Gallo
Comments: 32 pages, 6 figures
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[336]  arXiv:2102.07173 (replaced) [pdf, other]
Title: A note on VNP-completeness and border complexity
Comments: Theorem 1 has been strengthened. The topology has been adjusted. Section 7 is new
Subjects: Computational Complexity (cs.CC)
[337]  arXiv:2103.00347 (replaced) [pdf, ps, other]
Title: Better Together? How Externalities of Size Complicate Notions of Solidarity and Actuarial Fairness
Comments: Presented at ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) 2021
Subjects: Computer Science and Game Theory (cs.GT); Computers and Society (cs.CY)
[338]  arXiv:2103.09577 (replaced) [pdf, other]
Title: Theoretical bounds on data requirements for the ray-based classification
Comments: 10 pages, 5 figures
Journal-ref: SN Comput. Sci. 3, 57 (2022)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[339]  arXiv:2103.09812 (replaced) [pdf, other]
Title: Molecular Index Modulation using Convolutional Neural Networks
Comments: In submission to Elsevier Nano Communication Networks
Subjects: Emerging Technologies (cs.ET)
[340]  arXiv:2103.11594 (replaced) [pdf, other]
Title: Deep Neural Networks Learn Meta-Structures from Noisy Labels in Semantic Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[341]  arXiv:2103.11791 (replaced) [pdf, ps, other]
Title: Machine Learning Empowered Resource Allocation in IRS Aided MISO-NOMA Networks
Authors: X. Gao, Y. Liu, X. Liu, L. Song
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
[342]  arXiv:2103.13135 (replaced) [pdf, ps, other]
Title: Homomorphic encoders of profinite abelian groups I
Subjects: Group Theory (math.GR); Information Theory (cs.IT); General Topology (math.GN)
[343]  arXiv:2103.14785 (replaced) [pdf, other]
Title: A Comprehensive Review of the Video-to-Text Problem
Comments: 66 pages, 6 figures. Accepted by Artificial Intelligence Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[344]  arXiv:2104.01632 (replaced) [pdf, other]
Title: Isconna: Streaming Anomaly Detection with Frequency and Patterns
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[345]  arXiv:2104.03926 (replaced) [pdf, other]
Title: Conditional Meta-Network for Blind Super-Resolution with Multiple Degradations
Comments: Under review. Our code will be released after reviewing!
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[346]  arXiv:2104.05128 (replaced) [pdf, ps, other]
Title: A Scalable Algorithm for Decentralized Actor Termination Detection
Comments: 33 pages, 4 figures. Extended version of CONCUR 2020 paper arXiv:2007.10553. Accepted to LMCS. This version incorporates comments from reviewers, leading several paragraphs to be expanded and rephrased
Subjects: Logic in Computer Science (cs.LO)
[347]  arXiv:2104.05755 (replaced) [pdf, other]
Title: Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads
Subjects: Artificial Intelligence (cs.AI)
[348]  arXiv:2104.08646 (replaced) [pdf, other]
Title: Competency Problems: On Finding and Removing Artifacts in Language Data
Comments: EMNLP 2021. This version fixes an error in Proposition 1 and adds discussion (the EMNLP camera ready version is unfixed)
Subjects: Computation and Language (cs.CL)
[349]  arXiv:2104.09766 (replaced) [pdf, other]
Title: Solution landscape of the Onsager model identifies non-axisymmetric critical points
Subjects: Numerical Analysis (math.NA)
[350]  arXiv:2104.12147 (replaced) [pdf]
Title: Learning Aided Auctioning based Spectrum Access System in a Wireless Optical Network
Comments: Communicated to IEEE for possible publication
Subjects: Networking and Internet Architecture (cs.NI)
[351]  arXiv:2105.03210 (replaced) [pdf, other]
Title: Series reversion in Calderón's problem
Comments: 24 pages, 5 figures
Subjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA)
[352]  arXiv:2105.04738 (replaced) [pdf, other]
Title: Lightweight Distributed Gaussian Process Regression for Online Machine Learning
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[353]  arXiv:2105.08291 (replaced) [pdf, other]
Title: Independent Asymmetric Embedding for Cascade Prediction on Social Networks
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[354]  arXiv:2105.09045 (replaced) [pdf, other]
Title: Do Models Learn the Directionality of Relations? A New Evaluation: Relation Direction Recognition
Comments: 10 pages, 4 figures. accepted by IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI)
Subjects: Computation and Language (cs.CL)
[355]  arXiv:2105.09163 (replaced) [pdf, other]
Title: High-Performance FPGA-based Accelerator for Bayesian Neural Networks
Comments: Design Automation Conference (DAC) 2021
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[356]  arXiv:2105.11521 (replaced) [pdf, other]
Title: Deep neural network enabled corrective source term approach to hybrid analysis and modeling
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
[357]  arXiv:2105.13598 (replaced) [pdf, other]
Title: End-to-End Deep Fault Tolerant Control
Comments: 11 pages, 7 figures
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
[358]  arXiv:2105.14656 (replaced) [pdf, other]
Title: Human-level COVID-19 Diagnosis from Low-dose CT Scans Using a Two-stage Time-distributed Capsule Network
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[359]  arXiv:2105.15168 (replaced) [pdf, other]
Title: MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[360]  arXiv:2106.00774 (replaced) [pdf, other]
Title: Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)
[361]  arXiv:2106.03996 (replaced) [pdf]
Title: Lessons learned developing and using a machine learning model to automatically transcribe 2.3 million handwritten occupation codes
Subjects: Machine Learning (cs.LG)
[362]  arXiv:2106.04550 (replaced) [pdf, other]
Title: DETReg: Unsupervised Pretraining with Region Priors for Object Detection
Comments: Tech report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[363]  arXiv:2106.05390 (replaced) [pdf, other]
Title: Optimizing Reusable Knowledge for Continual Learning via Metalearning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[364]  arXiv:2106.06097 (replaced) [pdf, other]
Title: Neural Optimization Kernel: Towards Robust Deep Learning
Comments: Deep Learning, Kernel Methods, Deep Learning Theory, Kernel Approximation, Integral Approximation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[365]  arXiv:2106.06201 (replaced) [pdf, other]
Title: Distributed Urban Freeway Traffic Optimization Considering Congestion Propagation
Subjects: Systems and Control (eess.SY)
[366]  arXiv:2106.07533 (replaced) [pdf, other]
Title: Posterior Temperature Optimization in Variational Inference for Inverse Problems
Comments: Accepted at Bayesian Deep Learning workshop, NeurIPS 2021
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
[367]  arXiv:2106.07995 (replaced) [pdf, other]
Title: Learning of feature points without additional supervision improves reinforcement learning from images
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[368]  arXiv:2106.09877 (replaced) [pdf, other]
Title: HIFIR: Hybrid Incomplete Factorization with Iterative Refinement for Preconditioning Ill-conditioned and Singular Systems
Comments: Submitted to ACM Transactions on Mathematical Software (TOMS)
Subjects: Numerical Analysis (math.NA); Mathematical Software (cs.MS)
[369]  arXiv:2106.11958 (replaced) [pdf, other]
Title: Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation
Comments: NeurIPS 2021, Spotlight; Our code and video resources are available at this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[370]  arXiv:2106.12052 (replaced) [pdf, other]
Title: Volume Rendering of Neural Implicit Surfaces
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[371]  arXiv:2106.12066 (replaced) [pdf, other]
Title: It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning
Comments: Accepted to Findings of ACL 2021. 13 pages, 4 figures. Code: this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[372]  arXiv:2106.13703 (replaced) [pdf, other]
Title: Task-Driven Detection of Distribution Shifts with Statistical Guarantees for Robot Learning
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Applications (stat.AP)
[373]  arXiv:2106.15860 (replaced) [pdf, other]
Title: Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning
Subjects: Machine Learning (cs.LG)
[374]  arXiv:2107.02156 (replaced) [pdf, other]
Title: Do Different Tracking Tasks Require Different Appearance Models?
Comments: To appear at NeurIPS 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[375]  arXiv:2107.02375 (replaced) [pdf, other]
Title: SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[376]  arXiv:2107.06608 (replaced) [pdf, other]
Title: Continuous vs. Discrete Optimization of Deep Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[377]  arXiv:2107.06912 (replaced) [pdf, other]
Title: From Show to Tell: A Survey on Deep Learning-based Image Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[378]  arXiv:2107.07769 (replaced) [pdf]
Title: Architecture of Automated Crypto-Finance Agent
Comments: 9 pages, 7 figures
Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Multiagent Systems (cs.MA)
[379]  arXiv:2107.09153 (replaced) [pdf, other]
Title: User Association in Dense mmWave Networks as Restless Bandits
Comments: 10 pages, 6 figures
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)
[380]  arXiv:2107.10407 (replaced) [pdf, other]
Title: Designing a Location Trace Anonymization Contest
Subjects: Cryptography and Security (cs.CR); Databases (cs.DB)
[381]  arXiv:2107.12045 (replaced) [pdf, other]
Title: How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review
Comments: 60 pages (92 pages with references and complements), submitted to a journal (Automated Software Engineering). Changes: Emphasizing difference traditional software engineering / ML approach. Adding Related Works, Threats to Validity and Complementary Materials. Adding a table listing papers reference for each section/subsections
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
[382]  arXiv:2107.12719 (replaced) [pdf, other]
Title: The CORSMAL benchmark for the prediction of the properties of containers
Comments: 13 pages, 6 tables, 7 figures, Pre-print submitted to IEEE Access
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[383]  arXiv:2107.13317 (replaced) [pdf, other]
Title: C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds
Comments: 10 pages, 5 figures, IEEE IC2E 2021. arXiv admin note: text overlap with arXiv:2011.07965
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[384]  arXiv:2107.14604 (replaced) [pdf, other]
Title: Three-Dimensional Data-Driven Magnetostatic Field Computation using Real-World Measurement Data
Comments: 10 pages, 8 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)
[385]  arXiv:2108.00236 (replaced) [pdf, ps, other]
Title: Debiasing Samples from Online Learning Using Bootstrap
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
[386]  arXiv:2108.00927 (replaced) [pdf, other]
Title: Cloud Native Privacy Engineering through DevPrivOps
Authors: Elias Grünewald
Comments: preprint version (2021-12-01), accepted for the Post-Proceedings at the 16th IFIP Summer School on Privacy and Identity Management 2021
Subjects: Software Engineering (cs.SE); Computers and Society (cs.CY)
[387]  arXiv:2108.01819 (replaced) [pdf, other]
Title: Transfer Learning for Pose Estimation of Illustrated Characters
Comments: published at WACV2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[388]  arXiv:2108.06911 (replaced) [pdf, ps, other]
Title: Optimal Actor-Critic Policy with Optimized Training Datasets
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[389]  arXiv:2108.11761 (replaced) [pdf, other]
Title: A Framework for Learning Ante-hoc Explainable Models via Concepts
Comments: 16 pages, 15 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[390]  arXiv:2108.13342 (replaced) [pdf, other]
Title: DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[391]  arXiv:2109.01093 (replaced) [src]
Title: What Users Want? WARHOL: A Generative Model for Recommendation
Comments: conflict with other editor
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
[392]  arXiv:2109.01120 (replaced) [pdf, other]
Title: Automatic Diagnosis of Schizophrenia in EEG Signals Using CNN-LSTM Models
Journal-ref: Front. Neuroinform. 15:777977 (2021)
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[393]  arXiv:2109.03856 (replaced) [pdf, other]
Title: Local Augmentation for Graph Neural Networks
Comments: 16 pages, 5 figures
Subjects: Machine Learning (cs.LG)
[394]  arXiv:2109.04405 (replaced) [pdf, other]
Title: An Accelerated Proximal Gradient-based Model Predictive Control Algorithm
Authors: Jia Wang, Ying Yang
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[395]  arXiv:2109.05975 (replaced) [pdf, other]
Title: Balancing the Budget: Feature Selection and Tracking for Multi-Camera Visual-Inertial Odometry
Comments: Video at this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[396]  arXiv:2109.07644 (replaced) [pdf, other]
Title: OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication
Comments: Submitted to ICRA2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[397]  arXiv:2109.09076 (replaced) [pdf, other]
Title: Towards Representation Learning for Atmospheric Dynamics
Journal-ref: NEURIPS 2021 workshop on Tackling Climate Change with Machine Learning (poster)
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
[398]  arXiv:2109.09451 (replaced) [pdf, other]
Title: Money grows on (proof-)trees: the formal FA1.2 ledger standard
Subjects: Logic in Computer Science (cs.LO)
[399]  arXiv:2109.09484 (replaced) [pdf, other]
Title: On Circuit-based Hybrid Quantum Neural Networks for Remote Sensing Imagery Classification
Comments: Submitted to the JSTARS special issue on "Quantum resources for Earth Observation" for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET); Quantum Physics (quant-ph)
[400]  arXiv:2109.10569 (replaced) [pdf, other]
Title: The Curse Revisited: When are Distances Informative for the Ground Truth in Noisy High-Dimensional Data?
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[401]  arXiv:2109.10619 (replaced) [pdf, other]
Title: Eliciting Thinking Hierarchy without Prior
Subjects: Computer Science and Game Theory (cs.GT)
[402]  arXiv:2109.11338 (replaced) [pdf, other]
Title: Orthogonal Graph Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[403]  arXiv:2109.12219 (replaced) [pdf]
Title: Influence of Mobility Restrictions on Transmission of COVID-19 in the state of Maryland -- the USA
Authors: Nandini Raghuraman (1), Kartik Kaushik (1), (1 Department of Epidemiology and Public Health University of Maryland School of Medicine)
Subjects: Applications (stat.AP); Machine Learning (cs.LG)
[404]  arXiv:2109.12227 (replaced) [pdf, other]
Title: Bringing Generalization to Deep Multi-view Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[405]  arXiv:2110.00528 (replaced) [pdf, other]
Title: Do Self-Supervised and Supervised Methods Learn Similar Visual Representations?
Comments: Accepted to 2nd Workshop on Self-Supervised Learning: Theory and Practice (NeurIPS 2021), Sydney, Australia. Fixed typos, added acknowledgements. 5 pages + 2 pages of appendices, 5 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[406]  arXiv:2110.01343 (replaced) [pdf, other]
Title: Taming singular stochastic differential equations: A numerical method
Comments: 63 pages
Subjects: Probability (math.PR); Numerical Analysis (math.NA)
[407]  arXiv:2110.05717 (replaced) [pdf, other]
Title: Relation-aware Video Reading Comprehension for Temporal Language Grounding
Comments: Accepted by EMNLP-21
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[408]  arXiv:2110.06537 (replaced) [pdf, other]
Title: Well-classified Examples are Underestimated in Classification with Deep Neural Networks
Comments: Accepted by AAAI 2022; 16 pages, 11 figures, 13 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[409]  arXiv:2110.06634 (replaced) [pdf, other]
Title: End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network
Comments: 12 pages, 13 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[410]  arXiv:2110.07037 (replaced) [pdf, ps, other]
Title: Solving multiscale steady radiative transfer equation using neural networks with uniform stability
Subjects: Numerical Analysis (math.NA)
[411]  arXiv:2110.07699 (replaced) [pdf, other]
Title: Safe Autonomous Racing via Approximate Reachability on Ego-vision
Comments: 17 pages, 15 figures, 3 tables
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
[412]  arXiv:2110.08721 (replaced) [pdf, other]
Title: CAE-Transformer: Transformer-based Model to Predict Invasiveness of Lung Adenocarcinoma Subsolid Nodules from Non-thin Section 3D CT Scans
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[413]  arXiv:2110.08851 (replaced) [pdf, other]
Title: Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning
Comments: second revision
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[414]  arXiv:2110.09193 (replaced) [pdf, other]
Title: Topologically Regularized Data Embeddings
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[415]  arXiv:2110.10249 (replaced) [pdf, other]
Title: Neural Stochastic Partial Differential Equations
Subjects: Machine Learning (cs.LG)
[416]  arXiv:2110.10780 (replaced) [pdf]
Title: An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)
Comments: update on contents and metadata
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[417]  arXiv:2110.11571 (replaced) [pdf, other]
Title: Anti-Backdoor Learning: Training Clean Models on Poisoned Data
Comments: Accepted to NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[418]  arXiv:2110.11657 (replaced) [pdf, other]
Title: Projective Manifold Gradient Layer for Deep Rotation Regression
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[419]  arXiv:2110.12667 (replaced) [pdf, other]
Title: Mixture-of-Variational-Experts for Continual Learning
Comments: 9 pages, 4 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[420]  arXiv:2110.13151 (replaced) [pdf, other]
Title: Self-supervised similarity search for large scientific datasets
Comments: 5 pages, 2 figures. The similarity search web app can be found at this https URL Accepted to the Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021). ArXiv admin note: text overlap with arXiv:2110.00023
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Astrophysics of Galaxies (astro-ph.GA); Computer Vision and Pattern Recognition (cs.CV)
[421]  arXiv:2110.13179 (replaced) [pdf, other]
Title: Probabilistic Hierarchical Forecasting with Deep Poisson Mixtures
Comments: Probabilistic Hierarchical Forecasting, Neural Networks, Poisson Mixtures, Preprint submitted to IJF
Journal-ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[422]  arXiv:2111.00273 (replaced) [pdf, other]
Title: Cross-Modality Fusion Transformer for Multispectral Object Detection
Comments: 6 figures, 4 tables, under consideration at Pattern Recognition Letters
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[423]  arXiv:2111.00765 (replaced) [pdf, other]
Title: Validate on Sim, Detect on Real -- Model Selection for Domain Randomization
Comments: Updated results section. Project website: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[424]  arXiv:2111.01853 (replaced) [pdf, other]
Title: Recursive Bayesian Networks: Generalising and Unifying Probabilistic Context-Free Grammars and Dynamic Bayesian Networks
Comments: To be published in: Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021); Code: this https URL; Comments: corrected typo in outside probabilities: {\alpha}(y) --&gt; {\alpha}(x)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[425]  arXiv:2111.02174 (replaced) [pdf, other]
Title: Unsupervised detection and open-set classification of fast-ramped flexibility activation events
Comments: Submitted to Applied Energy. Revised by the authors
Subjects: Systems and Control (eess.SY); Machine Learning (stat.ML)
[426]  arXiv:2111.02363 (replaced) [pdf, other]
Title: Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[427]  arXiv:2111.02666 (replaced) [pdf, other]
Title: Sensory attenuation develops as a result of sensorimotor experience
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[428]  arXiv:2111.03221 (replaced) [pdf, ps, other]
Title: Breaking the $n^k$ Barrier for Minimum $k$-cut on Simple Graphs
Authors: Zhiyang He, Jason Li
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
[429]  arXiv:2111.03941 (replaced) [pdf, other]
Title: Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods
Comments: Accepted to NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[430]  arXiv:2111.04357 (replaced) [pdf, other]
Title: Can semi-supervised learning reduce the amount of manual labelling required for effective radio galaxy morphology classification?
Comments: Accepted in: Fourth Workshop on Machine Learning and the Physical Sciences (35th Conference on Neural Information Processing Systems; NeurIPS2021); final version
Subjects: Astrophysics of Galaxies (astro-ph.GA); Machine Learning (cs.LG)
[431]  arXiv:2111.06721 (replaced) [pdf, other]
Title: Causal Multi-Agent Reinforcement Learning: Review and Open Problems
Comments: Accepted at Cooperative AI Workshop, NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[432]  arXiv:2111.08096 (replaced) [pdf, ps, other]
Title: VisualEnv: visual Gym environments with Blender
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[433]  arXiv:2111.08795 (replaced) [pdf, other]
Title: A Projection Operator-based Newton Method for the Trajectory Optimization of Closed Quantum Systems
Comments: 10 pages
Subjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)
[434]  arXiv:2111.09512 (replaced) [pdf, other]
Title: ILUT Smoothers for Hybrid C-AMG with Scaled Triangular Factors
Comments: v2 updated citation information
Subjects: Numerical Analysis (math.NA); Mathematical Software (cs.MS)
[435]  arXiv:2111.09838 (replaced) [pdf, other]
Title: On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications
Comments: 7 pages; Accepted at the Bayesian Deep Learning Workshop, NeurIPS 2021
Subjects: Machine Learning (cs.LG)
[436]  arXiv:2111.10430 (replaced) [pdf, ps, other]
Title: Some Error Analysis for the Quantum Phase Estimation Algorithms
Authors: Xiantao Li
Subjects: Quantum Physics (quant-ph); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
[437]  arXiv:2111.10780 (replaced) [pdf]
Title: FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection
Comments: 10 pages, 6 tables, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[438]  arXiv:2111.11294 (replaced) [pdf, other]
Title: Scaling Law for Recommendation Models: Towards General-purpose User Representations
Comments: 11 pages, 6 figures, 5 tables
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
[439]  arXiv:2111.11305 (replaced) [pdf, other]
Title: Universal Efficient Variable-rate Neural Image Compression
Comments: 5 pages, 5 figures
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
[440]  arXiv:2111.12124 (replaced) [pdf, ps, other]
Title: Towards Learning Universal Audio Representations
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[441]  arXiv:2111.12143 (replaced) [pdf, other]
Title: Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm
Comments: 28 pages, 8 figures
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); High Energy Physics - Theory (hep-th); Machine Learning (stat.ML)
[442]  arXiv:2111.12389 (replaced) [pdf, other]
Title: Track Boosting and Synthetic Data Aided Drone Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[443]  arXiv:2111.12525 (replaced) [pdf, other]
Title: Causality-inspired Single-source Domain Generalization for Medical Image Segmentation
Comments: Preprint
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[444]  arXiv:2111.12888 (replaced) [pdf, other]
Title: Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[445]  arXiv:2111.12978 (replaced) [pdf, ps, other]
Title: Observing Interventions: A logic for thinking about experiments
Comments: This is the extended version of a paper that will appear in a special issue of the Journal of Logic and Computation dedicated to the 3rd DaL{\'i} Workshop on Dynamic Logic: New Trends and Applications. Different from the journal version, here the reader can find the full technical appendix
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
[446]  arXiv:2111.13579 (replaced) [pdf, other]
Title: VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
Comments: Technical report; 14 pages, 9 figures;
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[447]  arXiv:2111.13670 (replaced) [pdf, other]
Title: Non-Convex Recovery from Phaseless Low-Resolution Blind Deconvolution Measurements using Noisy Masked Patterns
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[448]  arXiv:2111.13781 (replaced) [pdf, other]
Title: Common Sense Knowledge Learning for Open Vocabulary Neural Reasoning: A First View into Chronic Disease Literature
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[449]  arXiv:2111.14080 (replaced) [pdf, other]
Title: Empirical Conditional Mean: A New Method of Predicting Throughput in Uplink Data Network
Authors: Weijia Zheng
Comments: 5 pages, 7 figures
Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM)
[450]  arXiv:2111.14382 (replaced) [pdf, other]
Title: VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and Stereo Data Fusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[451]  arXiv:2111.14448 (replaced) [pdf, other]
Title: AVA-AVD: Audio-visual Speaker Diarization in the Wild
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[452]  arXiv:2111.14451 (replaced) [pdf, other]
Title: HDR-NeRF: High Dynamic Range Neural Radiance Fields
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[453]  arXiv:2111.14592 (replaced) [pdf, other]
Title: GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection
Comments: 7 pages, 5 figures
Subjects: Computation and Language (cs.CL)
[454]  arXiv:2111.14831 (replaced) [pdf]
Title: MIST-net: Multi-domain Integrative Swin Transformer network for Sparse-View CT Reconstruction
Comments: 24 pages, 10 figures, 57 references
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[455]  arXiv:2111.14955 (replaced) [pdf, ps, other]
Title: Privacy-Preserving Serverless Edge Learning with Decentralized Small Data
Comments: Submitted for publication in the IEEE Network
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[456]  arXiv:2111.14973 (replaced) [pdf, other]
Title: MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[457]  arXiv:2111.15090 (replaced) [pdf, other]
Title: The Geometric Occam's Razor Implicit in Deep Learning
Comments: Accepted as a NeurIPS 2021 workshop paper (OPT2021)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[458]  arXiv:2111.15179 (replaced) [pdf, other]
Title: A Highly Effective Low-Rank Compression of Deep Neural Networks with Modified Beam-Search and Modified Stable Rank
Comments: 8 pages, 8 figures, 2 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[459]  arXiv:2111.15246 (replaced) [pdf, other]
Title: Hallucinated Neural Radiance Fields in the Wild
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[460]  arXiv:2111.15288 (replaced) [pdf, other]
Title: Revisiting Temporal Alignment for Video Restoration
Comments: 15 pages. 17 figures, 10 tables/
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[461]  arXiv:2111.15365 (replaced) [pdf, other]
Title: Expert Aggregation for Financial Forecasting
Subjects: Statistical Finance (q-fin.ST); Machine Learning (cs.LG); Econometrics (econ.EM); Portfolio Management (q-fin.PM); Risk Management (q-fin.RM)
[462]  arXiv:2111.15445 (replaced) [pdf, ps, other]
Title: The Effect of Iterativity on Adversarial Opinion Forming
Comments: Title edited
Subjects: Artificial Intelligence (cs.AI); Probability (math.PR)
[463]  arXiv:2111.15476 (replaced) [pdf, other]
Title: A Scheme of Channel Prediction Based on Artificial Neural Network
Subjects: Information Theory (cs.IT)
[464]  arXiv:2111.15588 (replaced) [pdf, other]
Title: Pureformer: Do We Even Need Attention?
Subjects: Computation and Language (cs.CL)
[465]  arXiv:2111.15611 (replaced) [pdf, other]
Title: The Power of Communication in a Distributed Multi-Agent System
Comments: Cooperative AI Workshop at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[466]  arXiv:2111.15626 (replaced) [pdf, other]
Title: Variational Autoencoders for Studying the Manifold of Precoding Matrices with High Spectral Efficiency
Authors: Evgeny Bobrov (1 and 2), Alexander Markov (3), Dmitry Vetrov (3) ((1) Moscow Research Center, Huawei Technologies, Russia, (2) M. V. Lomonosov Moscow State University, Russia, (3) National Research University Higher School of Economics, Russia)
Comments: 4 pages, 1 figure
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
[467]  arXiv:2111.15640 (replaced) [pdf, other]
Title: Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
Comments: Please visit our project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[468]  arXiv:2111.15656 (replaced) [pdf, other]
Title: Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[ total of 468 entries: 1-468 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2112, contact, help  (Access key information)