We gratefully acknowledge support from
the Simons Foundation and member institutions.

Artificial Intelligence

New submissions

[ total of 58 entries: 1-58 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 30 Jul 21

[1]  arXiv:2107.13619 [pdf, other]
Title: A Deep Graph Reinforcement Learning Model for Improving User Experience in Live Video Streaming
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper we present a deep graph reinforcement learning model to predict and improve the user experience during a live video streaming event, orchestrated by an agent/tracker. We first formulate the user experience prediction problem as a classification task, accounting for the fact that most of the viewers at the beginning of an event have poor quality of experience due to low-bandwidth connections and limited interactions with the tracker. In our model we consider different factors that influence the quality of user experience and train the proposed model on diverse state-action transitions when viewers interact with the tracker. In addition, provided that past events have various user experience characteristics we follow a gradient boosting strategy to compute a global model that learns from different events. Our experiments with three real-world datasets of live video streaming events demonstrate the superiority of the proposed model against several baseline strategies. Moreover, as the majority of the viewers at the beginning of an event has poor experience, we show that our model can significantly increase the number of viewers with high quality experience by at least 75% over the first streaming minutes. Our evaluation datasets and implementation are publicly available at https://publicresearch.z13.web.core.windows.net

[2]  arXiv:2107.13641 [pdf, ps, other]
Title: Learned upper bounds for the Time-Dependent Travelling Salesman Problem
Comments: arXiv admin note: text overlap with arXiv:2009.07588
Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

Given a graph whose arc traversal times vary over time, the Time-Dependent Travelling Salesman Problem consists in finding a Hamiltonian tour of least total duration covering the vertices of the graph. The main goal of this work is to define tight upper bounds for this problem by reusing the information gained when solving instances with similar features. This is customary in distribution management, where vehicle routes have to be generated over and over again with similar input data. To this aim, we devise an upper bounding technique based on the solution of a classical (and simpler) time-independent Asymmetric Travelling Salesman Problem, where the constant arc costs are suitably defined by the combined use of a Linear Program and a mix of unsupervised and supervised Machine Learning techniques. The effectiveness of this approach has been assessed through a computational campaign on the real travel time functions of two European cities: Paris and London. The overall average gap between our heuristic and the best-known solutions is about 0.001\%. For 31 instances, new best solutions have been obtained.

[3]  arXiv:2107.13646 [pdf, other]
Title: Evaluating Relaxations of Logic for Neural Networks: A Comprehensive Study
Comments: IJCAI 2021 paper (Extended Version)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Symbolic knowledge can provide crucial inductive bias for training neural models, especially in low data regimes. A successful strategy for incorporating such knowledge involves relaxing logical statements into sub-differentiable losses for optimization. In this paper, we study the question of how best to relax logical expressions that represent labeled examples and knowledge about a problem; we focus on sub-differentiable t-norm relaxations of logic. We present theoretical and empirical criteria for characterizing which relaxation would perform best in various scenarios. In our theoretical study driven by the goal of preserving tautologies, the Lukasiewicz t-norm performs best. However, in our empirical analysis on the text chunking and digit recognition tasks, the product t-norm achieves best predictive performance. We analyze this apparent discrepancy, and conclude with a list of best practices for defining loss functions via logic.

[4]  arXiv:2107.13668 [pdf, other]
Title: Learning User-Interpretable Descriptions of Black-Box AI System Capabilities
Comments: ICAPS 2021 Workshop on Knowledge Engineering for Planning and Scheduling
Subjects: Artificial Intelligence (cs.AI)

Several approaches have been developed to answer specific questions that a user may have about an AI system that can plan and act. However, the problems of identifying which questions to ask and that of computing a user-interpretable symbolic description of the overall capabilities of the system have remained largely unaddressed. This paper presents an approach for addressing these problems by learning user-interpretable symbolic descriptions of the limits and capabilities of a black-box AI system using low-level simulators. It uses a hierarchical active querying paradigm to generate questions and to learn a user-interpretable model of the AI system based on its responses. In contrast to prior work, we consider settings where imprecision of the user's conceptual vocabulary precludes a direct expression of the agent's capabilities. Furthermore, our approach does not require assumptions about the internal design of the target AI system or about the methods that it may use to compute or learn task solutions. Empirical evaluation on several game-based simulator domains shows that this approach can efficiently learn symbolic models of AI systems that use a deterministic black-box policy in fully observable scenarios.

[5]  arXiv:2107.13669 [pdf, other]
Title: Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis
Comments: Accepted at ICMI 2021
Subjects: Artificial Intelligence (cs.AI)

Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data. This research area's major concern lies in developing an extraordinary fusion scheme that can extract and integrate key information from various modalities. However, one issue that may restrict previous work to achieve a higher level is the lack of proper modeling for the dynamics of the competition between the independence and relevance among modalities, which could deteriorate fusion outcomes by causing the collapse of modality-specific feature space or introducing extra noise. To mitigate this, we propose the Bi-Bimodal Fusion Network (BBFN), a novel end-to-end network that performs fusion (relevance increment) and separation (difference increment) on pairwise modality representations. The two parts are trained simultaneously such that the combat between them is simulated. The model takes two bimodal pairs as input due to the known information imbalance among modalities. In addition, we leverage a gated control mechanism in the Transformer architecture to further improve the final output. Experimental results on three datasets (CMU-MOSI, CMU-MOSEI, and UR-FUNNY) verifies that our model significantly outperforms the SOTA. The implementation of this work is available at https://github.com/declare-lab/BBFN.

[6]  arXiv:2107.13684 [pdf]
Title: An Online Question Answering System based on Sub-graph Searching
Authors: Shuangyong Song
Comments: 4 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI)

Knowledge graphs (KGs) have been widely used for question answering (QA) applications, especially the entity based QA. However, searching an-swers from an entire large-scale knowledge graph is very time-consuming and it is hard to meet the speed need of real online QA systems. In this pa-per, we design a sub-graph searching mechanism to solve this problem by creating sub-graph index, and each answer generation step is restricted in the sub-graph level. We use this mechanism into a real online QA chat system, and it can bring obvious improvement on question coverage by well answer-ing entity based questions, and it can be with a very high speed, which en-sures the user experience of online QA.

[7]  arXiv:2107.13704 [pdf]
Title: A Theory of Consciousness from a Theoretical Computer Science Perspective 2: Insights from the Conscious Turing Machine
Comments: arXiv admin note: text overlap with arXiv:2011.09850
Subjects: Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

The quest to understand consciousness, once the purview of philosophers and theologians, is now actively pursued by scientists of many stripes. We examine consciousness from the perspective of theoretical computer science (TCS), a branch of mathematics concerned with understanding the underlying principles of computation and complexity, including the implications and surprising consequences of resource limitations. In the spirit of Alan Turing's simple yet powerful definition of a computer, the Turing Machine (TM), and perspective of computational complexity theory, we formalize a modified version of the Global Workspace Theory (GWT) of consciousness originated by cognitive neuroscientist Bernard Baars and further developed by him, Stanislas Dehaene, Jean-Pierre Changeaux and others. We are not looking for a complex model of the brain nor of cognition, but for a simple computational model of (the admittedly complex concept of) consciousness. We do this by defining the Conscious Turing Machine (CTM), also called a conscious AI, and then we define consciousness and related notions in the CTM. While these are only mathematical (TCS) definitions, we suggest why the CTM has the feeling of consciousness. The TCS perspective provides a simple formal framework to employ tools from computational complexity theory and machine learning to help us understand consciousness and related concepts. Previously we explored high level explanations for the feelings of pain and pleasure in the CTM. Here we consider three examples related to vision (blindsight, inattentional blindness, and change blindness), followed by discussions of dreams, free will, and altered states of consciousness.

[8]  arXiv:2107.13734 [pdf, ps, other]
Title: An Ethical Framework for Guiding the Development of Affectively-Aware Artificial Intelligence
Authors: Desmond C. Ong
Comments: Accepted at IEEE Affective Computing and Intelligent Interaction 2021
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

The recent rapid advancements in artificial intelligence research and deployment have sparked more discussion about the potential ramifications of socially- and emotionally-intelligent AI. The question is not if research can produce such affectively-aware AI, but when it will. What will it mean for society when machines -- and the corporations and governments they serve -- can "read" people's minds and emotions? What should developers and operators of such AI do, and what should they not do? The goal of this article is to pre-empt some of the potential implications of these developments, and propose a set of guidelines for evaluating the (moral and) ethical consequences of affectively-aware AI, in order to guide researchers, industry professionals, and policy-makers. We propose a multi-stakeholder analysis framework that separates the ethical responsibilities of AI Developers vis-\`a-vis the entities that deploy such AI -- which we term Operators. Our analysis produces two pillars that clarify the responsibilities of each of these stakeholders: Provable Beneficence, which rests on proving the effectiveness of the AI, and Responsible Stewardship, which governs responsible collection, use, and storage of data and the decisions made from such data. We end with recommendations for researchers, developers, operators, as well as regulators and law-makers.

[9]  arXiv:2107.13977 [pdf, other]
Title: Underwater Acoustic Networks for Security Risk Assessment in Public Drinking Water Reservoirs
Subjects: Artificial Intelligence (cs.AI)

We have built a novel system for the surveillance of drinking water reservoirs using underwater sensor networks. We implement an innovative AI-based approach to detect, classify and localize underwater events. In this paper, we describe the technology and cognitive AI architecture of the system based on one of the sensor networks, the hydrophone network. We discuss the challenges of installing and using the hydrophone network in a water reservoir where traffic, visitors, and variable water conditions create a complex, varying environment. Our AI solution uses an autoencoder for unsupervised learning of latent encodings for classification and anomaly detection, and time delay estimates for sound localization. Finally, we present the results of experiments carried out in a laboratory pool and the water reservoir and discuss the system's potential.

[10]  arXiv:2107.14000 [pdf, other]
Title: Resisting Out-of-Distribution Data Problem in Perturbation of XAI
Subjects: Artificial Intelligence (cs.AI)

With the rapid development of eXplainable Artificial Intelligence (XAI), perturbation-based XAI algorithms have become quite popular due to their effectiveness and ease of implementation. The vast majority of perturbation-based XAI techniques face the challenge of Out-of-Distribution (OoD) data -- an artifact of randomly perturbed data becoming inconsistent with the original dataset. OoD data leads to the over-confidence problem in model predictions, making the existing XAI approaches unreliable. To our best knowledge, the OoD data problem in perturbation-based XAI algorithms has not been adequately addressed in the literature. In this work, we address this OoD data problem by designing an additional module quantifying the affinity between the perturbed data and the original dataset distribution, which is integrated into the process of aggregation. Our solution is shown to be compatible with the most popular perturbation-based XAI algorithms, such as RISE, OCCLUSION, and LIME. Experiments have confirmed that our methods demonstrate a significant improvement in general cases using both computational and cognitive metrics. Especially in the case of degradation, our proposed approach demonstrates outstanding performance comparing to baselines. Besides, our solution also resolves a fundamental problem with the faithfulness indicator, a commonly used evaluation metric of XAI algorithms that appears to be sensitive to the OoD issue.

[11]  arXiv:2107.14199 [pdf, other]
Title: RSO: A Novel Reinforced Swarm Optimization Algorithm for Feature Selection
Subjects: Artificial Intelligence (cs.AI)

Swarm optimization algorithms are widely used for feature selection before data mining and machine learning applications. The metaheuristic nature-inspired feature selection approaches are used for single-objective optimization tasks, though the major problem is their frequent premature convergence, leading to weak contribution to data mining. In this paper, we propose a novel feature selection algorithm named Reinforced Swarm Optimization (RSO) leveraging some of the existing problems in feature selection. This algorithm embeds the widely used Bee Swarm Optimization (BSO) algorithm along with Reinforcement Learning (RL) to maximize the reward of a superior search agent and punish the inferior ones. This hybrid optimization algorithm is more adaptive and robust with a good balance between exploitation and exploration of the search space. The proposed method is evaluated on 25 widely known UCI datasets containing a perfect blend of balanced and imbalanced data. The obtained results are compared with several other popular and recent feature selection algorithms with similar classifier configurations. The experimental outcome shows that our proposed model outperforms BSO in 22 out of 25 instances (88%). Moreover, experimental results also show that RSO performs the best among all the methods compared in this paper in 19 out of 25 cases (76%), establishing the superiority of our proposed method.

Cross-lists for Fri, 30 Jul 21

[12]  arXiv:2107.13586 (cross-list from cs.CL) [pdf, other]
Title: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Comments: Website: this http URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning". Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x' that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g.the choice of pre-trained models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website this http URL including constantly-updated survey, and paperlist.

[13]  arXiv:2107.13587 (cross-list from cs.CV) [pdf, other]
Title: Fast and Scalable Image Search For Histology
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Tissues and Organs (q-bio.TO)

The expanding adoption of digital pathology has enabled the curation of large repositories of histology whole slide images (WSIs), which contain a wealth of information. Similar pathology image search offers the opportunity to comb through large historical repositories of gigapixel WSIs to identify cases with similar morphological features and can be particularly useful for diagnosing rare diseases, identifying similar cases for predicting prognosis, treatment outcomes, and potential clinical trial success. A critical challenge in developing a WSI search and retrieval system is scalability, which is uniquely challenging given the need to search a growing number of slides that each can consist of billions of pixels and are several gigabytes in size. Such systems are typically slow and retrieval speed often scales with the size of the repository they search through, making their clinical adoption tedious and are not feasible for repositories that are constantly growing. Here we present Fast Image Search for Histopathology (FISH), a histology image search pipeline that is infinitely scalable and achieves constant search speed that is independent of the image database size while being interpretable and without requiring detailed annotations. FISH uses self-supervised deep learning to encode meaningful representations from WSIs and a Van Emde Boas tree for fast search, followed by an uncertainty-based ranking algorithm to retrieve similar WSIs. We evaluated FISH on multiple tasks and datasets with over 22,000 patient cases spanning 56 disease subtypes. We additionally demonstrate that FISH can be used to assist with the diagnosis of rare cancer types where sufficient cases may not be available to train traditional supervised deep models. FISH is available as an easy-to-use, open-source software package (https://github.com/mahmoodlab/FISH).

[14]  arXiv:2107.13625 (cross-list from cs.LG) [pdf, other]
Title: Generalizing Fairness: Discovery and Mitigation of Unknown Sensitive Attributes
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)

When deploying artificial intelligence (AI) in the real world, being able to trust the operation of the AI by characterizing how it performs is an ever-present and important topic. An important and still largely unexplored task in this characterization is determining major factors within the real world that affect the AI's behavior, such as weather conditions or lighting, and either a) being able to give justification for why it may have failed or b) eliminating the influence the factor has. Determining these sensitive factors heavily relies on collected data that is diverse enough to cover numerous combinations of these factors, which becomes more onerous when having many potential sensitive factors or operating in complex environments. This paper investigates methods that discover and separate out individual semantic sensitive factors from a given dataset to conduct this characterization as well as addressing mitigation of these factors' sensitivity. We also broaden remediation of fairness, which normally only addresses socially relevant factors, and widen it to deal with the desensitization of AI with regard to all possible aspects of variation in the domain. The proposed methods which discover these major factors reduce the potentially onerous demands of collecting a sufficiently diverse dataset. In experiments using the road sign (GTSRB) and facial imagery (CelebA) datasets, we show the promise of using this scheme to perform this characterization and remediation and demonstrate that our approach outperforms state of the art approaches.

[15]  arXiv:2107.13640 (cross-list from cs.CR) [pdf, ps, other]
Title: Secure Bayesian Federated Analytics for Privacy-Preserving Trend Detection
Comments: 10 pages, 1 figure
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Federated analytics has many applications in edge computing, its use can lead to better decision making for service provision, product development, and user experience. We propose a Bayesian approach to trend detection in which the probability of a keyword being trendy, given a dataset, is computed via Bayes' Theorem; the probability of a dataset, given that a keyword is trendy, is computed through secure aggregation of such conditional probabilities over local datasets of users. We propose a protocol, named SAFE, for Bayesian federated analytics that offers sufficient privacy for production grade use cases and reduces the computational burden of users and an aggregator. We illustrate this approach with a trend detection experiment and discuss how this approach could be extended further to make it production-ready.

[16]  arXiv:2107.13720 (cross-list from cs.CV) [pdf, other]
Title: Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection
Comments: Accepted for publication in the 29th ACM International Conference on Multimedia (ACMMM '21)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)

Detecting abnormal activities in real-world surveillance videos is an important yet challenging task as the prior knowledge about video anomalies is usually limited or unavailable. Despite that many approaches have been developed to resolve this problem, few of them can capture the normal spatio-temporal patterns effectively and efficiently. Moreover, existing works seldom explicitly consider the local consistency at frame level and global coherence of temporal dynamics in video sequences. To this end, we propose Convolutional Transformer based Dual Discriminator Generative Adversarial Networks (CT-D2GAN) to perform unsupervised video anomaly detection. Specifically, we first present a convolutional transformer to perform future frame prediction. It contains three key components, i.e., a convolutional encoder to capture the spatial information of the input video clips, a temporal self-attention module to encode the temporal dynamics, and a convolutional decoder to integrate spatio-temporal features and predict the future frame. Next, a dual discriminator based adversarial training procedure, which jointly considers an image discriminator that can maintain the local consistency at frame-level and a video discriminator that can enforce the global coherence of temporal dynamics, is employed to enhance the future frame prediction. Finally, the prediction error is used to identify abnormal video frames. Thoroughly empirical studies on three public video anomaly detection datasets, i.e., UCSD Ped2, CUHK Avenue, and Shanghai Tech Campus, demonstrate the effectiveness of the proposed adversarial spatio-temporal modeling framework.

[17]  arXiv:2107.13731 (cross-list from cs.CV) [pdf, other]
Title: UIBert: Learning Generic Multimodal Representations for UI Understanding
Comments: 8 pages, IJCAI 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

To improve the accessibility of smart devices and to simplify their usage, building models which understand user interfaces (UIs) and assist users to complete their tasks is critical. However, unique challenges are proposed by UI-specific characteristics, such as how to effectively leverage multimodal UI features that involve image, text, and structural metadata and how to achieve good performance when high-quality labeled data is unavailable. To address such challenges we introduce UIBert, a transformer-based joint image-text model trained through novel pre-training tasks on large-scale unlabeled UI data to learn generic feature representations for a UI and its components. Our key intuition is that the heterogeneous features in a UI are self-aligned, i.e., the image and text features of UI components, are predictive of each other. We propose five pretraining tasks utilizing this self-alignment among different features of a UI component and across various components in the same UI. We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.

[18]  arXiv:2107.13738 (cross-list from cs.HC) [pdf, other]
Title: Design-Driven Requirements for Computationally Co-Creative Game AI Design Tools
Comments: 12 pages, 1 figure. Accepted for publication in Foundations of Digital Games (FDG) 2021
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Game AI designers must manage complex interactions between the AI character, the game world, and the player, while achieving their design visions. Computational co-creativity tools can aid them, but first, AI and HCI researchers must gather requirements and determine design heuristics to build effective co-creative tools. In this work, we present a participatory design study that categorizes and analyzes game AI designers' workflows, goals, and expectations for such tools. We evince deep connections between game AI design and the design of co-creative tools, and present implications for future co-creativity tool research and development.

[19]  arXiv:2107.13742 (cross-list from cs.CV) [pdf, other]
Title: Profile to Frontal Face Recognition in the Wild Using Coupled Conditional GAN
Comments: arXiv admin note: substantial text overlap with arXiv:2005.02166
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In recent years, with the advent of deep-learning, face recognition has achieved exceptional success. However, many of these deep face recognition models perform much better in handling frontal faces compared to profile faces. The major reason for poor performance in handling of profile faces is that it is inherently difficult to learn pose-invariant deep representations that are useful for profile face recognition. In this paper, we hypothesize that the profile face domain possesses a latent connection with the frontal face domain in a latent feature subspace. We look to exploit this latent connection by projecting the profile faces and frontal faces into a common latent subspace and perform verification or retrieval in the latent domain. We leverage a coupled conditional generative adversarial network (cpGAN) structure to find the hidden relationship between the profile and frontal images in a latent common embedding subspace. Specifically, the cpGAN framework consists of two conditional GAN-based sub-networks, one dedicated to the frontal domain and the other dedicated to the profile domain. Each sub-network tends to find a projection that maximizes the pair-wise correlation between the two feature domains in a common embedding feature subspace. The efficacy of our approach compared with the state-of-the-art is demonstrated using the CFP, CMU Multi-PIE, IJB-A, and IJB-C datasets. Additionally, we have also implemented a coupled convolutional neural network (cpCNN) and an adversarial discriminative domain adaptation network (ADDA) for profile to frontal face recognition. We have evaluated the performance of cpCNN and ADDA and compared it with the proposed cpGAN. Finally, we have also evaluated our cpGAN for reconstruction of frontal faces from input profile faces contained in the VGGFace2 dataset.

[20]  arXiv:2107.13782 (cross-list from cs.LG) [pdf]
Title: Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
Comments: This is under review with a scientific journal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves multiple aspects: representation, translation, alignment, fusion, and co-learning. In the current state of multimodal machine learning, the assumptions are that all modalities are present, aligned, and noiseless during training and testing time. However, in real-world tasks, typically, it is observed that one or more modalities are missing, noisy, lacking annotated data, have unreliable labels, and are scarce in training or testing and or both. This challenge is addressed by a learning paradigm called multimodal co-learning. The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer of knowledge between modalities, including their representations and predictive models. Co-learning being an emerging area, there are no dedicated reviews explicitly focusing on all challenges addressed by co-learning. To that end, in this work, we provide a comprehensive survey on the emerging area of multimodal co-learning that has not been explored in its entirety yet. We review implementations that overcome one or more co-learning challenges without explicitly considering them as co-learning challenges. We present the comprehensive taxonomy of multimodal co-learning based on the challenges addressed by co-learning and associated implementations. The various techniques employed to include the latest ones are reviewed along with some of the applications and datasets. Our final goal is to discuss challenges and perspectives along with the important ideas and directions for future work that we hope to be beneficial for the entire research community focusing on this exciting domain.

[21]  arXiv:2107.13807 (cross-list from cs.CV) [pdf, other]
Title: FREE: Feature Refinement for Generalized Zero-Shot Learning
Comments: ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Generalized zero-shot learning (GZSL) has achieved significant progress, with many efforts dedicated to overcoming the problems of visual-semantic domain gap and seen-unseen bias. However, most existing methods directly use feature extraction models trained on ImageNet alone, ignoring the cross-dataset bias between ImageNet and GZSL benchmarks. Such a bias inevitably results in poor-quality visual features for GZSL tasks, which potentially limits the recognition performance on both seen and unseen classes. In this paper, we propose a simple yet effective GZSL method, termed feature refinement for generalized zero-shot learning (FREE), to tackle the above problem. FREE employs a feature refinement (FR) module that incorporates \textit{semantic$\rightarrow$visual} mapping into a unified generative model to refine the visual features of seen and unseen class samples. Furthermore, we propose a self-adaptive margin center loss (SAMC-loss) that cooperates with a semantic cycle-consistency loss to guide FR to learn class- and semantically-relevant representations, and concatenate the features in FR to extract the fully refined features. Extensive experiments on five benchmark datasets demonstrate the significant performance gain of FREE over its baseline and current state-of-the-art methods. Our codes are available at https://github.com/shiming-chen/FREE .

[22]  arXiv:2107.13876 (cross-list from cs.IR) [pdf, other]
Title: Understanding the Effects of Adversarial Personalized Ranking Optimization Method on Recommendation Quality
Comments: 5 pages
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Recommender systems (RSs) employ user-item feedback, e.g., ratings, to match customers to personalized lists of products. Approaches to top-k recommendation mainly rely on Learning-To-Rank algorithms and, among them, the most widely adopted is Bayesian Personalized Ranking (BPR), which bases on a pair-wise optimization approach. Recently, BPR has been found vulnerable against adversarial perturbations of its model parameters. Adversarial Personalized Ranking (APR) mitigates this issue by robustifying BPR via an adversarial training procedure. The empirical improvements of APR's accuracy performance on BPR have led to its wide use in several recommender models. However, a key overlooked aspect has been the beyond-accuracy performance of APR, i.e., novelty, coverage, and amplification of popularity bias, considering that recent results suggest that BPR, the building block of APR, is sensitive to the intensification of biases and reduction of recommendation novelty. In this work, we model the learning characteristics of the BPR and APR optimization frameworks to give mathematical evidence that, when the feedback data have a tailed distribution, APR amplifies the popularity bias more than BPR due to an unbalanced number of received positive updates from short-head items. Using matrix factorization (MF), we empirically validate the theoretical results by performing preliminary experiments on two public datasets to compare BPR-MF and APR-MF performance on accuracy and beyond-accuracy metrics. The experimental results consistently show the degradation of novelty and coverage measures and a worrying amplification of bias.

[23]  arXiv:2107.13904 (cross-list from cs.CV) [pdf, other]
Title: Cross-Camera Feature Prediction for Intra-Camera Supervised Person Re-identification across Distant Scenes
Comments: 10 pages, 6 figures, accepted by ACM International Conference on Multimedia
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Person re-identification (Re-ID) aims to match person images across non-overlapping camera views. The majority of Re-ID methods focus on small-scale surveillance systems in which each pedestrian is captured in different camera views of adjacent scenes. However, in large-scale surveillance systems that cover larger areas, it is required to track a pedestrian of interest across distant scenes (e.g., a criminal suspect escapes from one city to another). Since most pedestrians appear in limited local areas, it is difficult to collect training data with cross-camera pairs of the same person. In this work, we study intra-camera supervised person re-identification across distant scenes (ICS-DS Re-ID), which uses cross-camera unpaired data with intra-camera identity labels for training. It is challenging as cross-camera paired data plays a crucial role for learning camera-invariant features in most existing Re-ID methods. To learn camera-invariant representation from cross-camera unpaired training data, we propose a cross-camera feature prediction method to mine cross-camera self supervision information from camera-specific feature distribution by transforming fake cross-camera positive feature pairs and minimize the distances of the fake pairs. Furthermore, we automatically localize and extract local-level feature by a transformer. Joint learning of global-level and local-level features forms a global-local cross-camera feature prediction scheme for mining fine-grained cross-camera self supervision information. Finally, cross-camera self supervision and intra-camera supervision are aggregated in a framework. The experiments are conducted in the ICS-DS setting on Market-SCT, Duke-SCT and MSMT17-SCT datasets. The evaluation results demonstrate the superiority of our method, which gains significant improvements of 15.4 Rank-1 and 22.3 mAP on Market-SCT as compared to the second best method.

[24]  arXiv:2107.13944 (cross-list from cs.LG) [pdf]
Title: Lyapunov-based uncertainty-aware safe reinforcement learning
Comments: Submitted to IEEE Transactions on Neural Networks and Learning Systems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Reinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks. However, in many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety (e.g., avoiding collisions in autonomous driving). While RL problems are commonly formalized as Markov decision processes (MDPs), safety constraints are incorporated via constrained Markov decision processes (CMDPs). Although recent advances in safe RL have enabled learning safe policies in CMDPs, these safety requirements should be satisfied during both training and in the deployment process. Furthermore, it is shown that in memory-based and partially observable environments, these methods fail to maintain safety over unseen out-of-distribution observations. To address these limitations, we propose a Lyapunov-based uncertainty-aware safe RL model. The introduced model adopts a Lyapunov function that converts trajectory-based constraints to a set of local linear constraints. Furthermore, to ensure the safety of the agent in highly uncertain environments, an uncertainty quantification method is developed that enables identifying risk-averse actions through estimating the probability of constraint violations. Moreover, a Transformers model is integrated to provide the agent with memory to process long time horizons of information via the self-attention mechanism. The proposed model is evaluated in grid-world navigation tasks where safety is defined as avoiding static and dynamic obstacles in fully and partially observable environments. The results of these experiments show a significant improvement in the performance of the agent both in achieving optimality and satisfying safety constraints.

[25]  arXiv:2107.13955 (cross-list from cs.CL) [pdf, other]
Title: Demystifying Neural Language Models' Insensitivity to Word-Order
Comments: 11 pages, 13 figure + appendix
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Recent research analyzing the sensitivity of natural language understanding models to word-order perturbations have shown that the state-of-the-art models in several language tasks may have a unique way to understand the text that could seldom be explained with conventional syntax and semantics. In this paper, we investigate the insensitivity of natural language models to word-order by quantifying perturbations and analysing their effect on neural models' performance on language understanding tasks in GLUE benchmark. Towards that end, we propose two metrics - the Direct Neighbour Displacement (DND) and the Index Displacement Count (IDC) - that score the local and global ordering of tokens in the perturbed texts and observe that perturbation functions found in prior literature affect only the global ordering while the local ordering remains relatively unperturbed. We propose perturbations at the granularity of sub-words and characters to study the correlation between DND, IDC and the performance of neural language models on natural language tasks. We find that neural language models - pretrained and non-pretrained Transformers, LSTMs, and Convolutional architectures - require local ordering more so than the global ordering of tokens. The proposed metrics and the suite of perturbations allow a systematic way to study the (in)sensitivity of neural language understanding models to varying degree of perturbations.

[26]  arXiv:2107.13973 (cross-list from cs.CV) [pdf, other]
Title: Self-Supervised Learning for Fine-Grained Image Classification
Comments: 16 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Fine-grained image classification involves identifying different subcategories of a class which possess very subtle discriminatory features. Fine-grained datasets usually provide bounding box annotations along with class labels to aid the process of classification. However, building large scale datasets with such annotations is a mammoth task. Moreover, this extensive annotation is time-consuming and often requires expertise, which is a huge bottleneck in building large datasets. On the other hand, self-supervised learning (SSL) exploits the freely available data to generate supervisory signals which act as labels. The features learnt by performing some pretext tasks on huge unlabelled data proves to be very helpful for multiple downstream tasks.
Our idea is to leverage self-supervision such that the model learns useful representations of fine-grained image classes. We experimented with 3 kinds of models: Jigsaw solving as pretext task, adversarial learning (SRGAN) and contrastive learning based (SimCLR) model. The learned features are used for downstream tasks such as fine-grained image classification. Our code is available at this http URL

[27]  arXiv:2107.13994 (cross-list from cs.CV) [pdf, other]
Title: Improving Robustness and Accuracy via Relative Information Encoding in 3D Human Pose Estimation
Comments: In Proceedings of the 29th ACM International Conference on Multimedia (MM '21)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Most of the existing 3D human pose estimation approaches mainly focus on predicting 3D positional relationships between the root joint and other human joints (local motion) instead of the overall trajectory of the human body (global motion). Despite the great progress achieved by these approaches, they are not robust to global motion, and lack the ability to accurately predict local motion with a small movement range. To alleviate these two problems, we propose a relative information encoding method that yields positional and temporal enhanced representations. Firstly, we encode positional information by utilizing relative coordinates of 2D poses to enhance the consistency between the input and output distribution. The same posture with different absolute 2D positions can be mapped to a common representation. It is beneficial to resist the interference of global motion on the prediction results. Second, we encode temporal information by establishing the connection between the current pose and other poses of the same person within a period of time. More attention will be paid to the movement changes before and after the current pose, resulting in better prediction performance on local motion with a small movement range. The ablation studies validate the effectiveness of the proposed relative information encoding method. Besides, we introduce a multi-stage optimization method to the whole framework to further exploit the positional and temporal enhanced representations. Our method outperforms state-of-the-art methods on two public datasets. Code is available at https://github.com/paTRICK-swk/Pose3D-RIE.

[28]  arXiv:2107.13998 (cross-list from cs.CY) [pdf, other]
Title: "Excavating AI" Re-excavated: Debunking a Fallacious Account of the JAFFE Dataset
Authors: Michael J. Lyons
Comments: 20 pages, 4 figures
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Twenty-five years ago, my colleagues Miyuki Kamachi and Jiro Gyoba and I designed and photographed JAFFE, a set of facial expression images intended for use in a study of face perception. In 2019, without seeking permission or informing us, Kate Crawford and Trevor Paglen exhibited JAFFE in two widely publicized art shows. In addition, they published a nonfactual account of the images in the essay "Excavating AI: The Politics of Images in Machine Learning Training Sets." The present article recounts the creation of the JAFFE dataset and unravels each of Crawford and Paglen's fallacious statements. I also discuss JAFFE more broadly in connection with research on facial expression, affective computing, and human-computer interaction.

[29]  arXiv:2107.14028 (cross-list from cs.SD) [pdf, other]
Title: Estimating Respiratory Rate From Breath Audio Obtained Through Wearable Microphones
Comments: International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Respiratory rate (RR) is a clinical metric used to assess overall health and physical fitness. An individual's RR can change from their baseline due to chronic illness symptoms (e.g., asthma, congestive heart failure), acute illness (e.g., breathlessness due to infection), and over the course of the day due to physical exhaustion during heightened exertion. Remote estimation of RR can offer a cost-effective method to track disease progression and cardio-respiratory fitness over time. This work investigates a model-driven approach to estimate RR from short audio segments obtained after physical exertion in healthy adults. Data was collected from 21 individuals using microphone-enabled, near-field headphones before, during, and after strenuous exercise. RR was manually annotated by counting perceived inhalations and exhalations. A multi-task Long-Short Term Memory (LSTM) network with convolutional layers was implemented to process mel-filterbank energies, estimate RR in varying background noise conditions, and predict heavy breathing, indicated by an RR of more than 25 breaths per minute. The multi-task model performs both classification and regression tasks and leverages a mixture of loss functions. It was observed that RR can be estimated with a concordance correlation coefficient (CCC) of 0.76 and a mean squared error (MSE) of 0.2, demonstrating that audio can be a viable signal for approximating RR.

[30]  arXiv:2107.14037 (cross-list from cs.LG) [pdf, other]
Title: Machine Learning and Deep Learning Methods for Building Intelligent Systems in Medicine and Drug Discovery: A Comprehensive Survey
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

With the advancements in computer technology, there is a rapid development of intelligent systems to understand the complex relationships in data to make predictions and classifications. Artificail Intelligence based framework is rapidly revolutionizing the healthcare industry. These intelligent systems are built with machine learning and deep learning based robust models for early diagnosis of diseases and demonstrates a promising supplementary diagnostic method for frontline clinical doctors and surgeons. Machine Learning and Deep Learning based systems can streamline and simplify the steps involved in diagnosis of diseases from clinical and image-based data, thus providing significant clinician support and workflow optimization. They mimic human cognition and are even capable of diagnosing diseases that cannot be diagnosed with human intelligence. This paper focuses on the survey of machine learning and deep learning applications in across 16 medical specialties, namely Dental medicine, Haematology, Surgery, Cardiology, Pulmonology, Orthopedics, Radiology, Oncology, General medicine, Psychiatry, Endocrinology, Neurology, Dermatology, Hepatology, Nephrology, Ophthalmology, and Drug discovery. In this paper along with the survey, we discuss the advancements of medical practices with these systems and also the impact of these systems on medical professionals.

[31]  arXiv:2107.14042 (cross-list from cs.CY) [pdf]
Title: The brain is a computer is a brain: neuroscience's internal debate and the social significance of the Computational Metaphor
Authors: Alexis T. Baria (1), Keith Cross (2) ((1) Society of Spoken Art, New York, USA, (2) University of Hawai`i at Manoa, Honolulu, USA)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

The Computational Metaphor, comparing the brain to the computer and vice versa, is the most prominent metaphor in neuroscience and artificial intelligence (AI). Its appropriateness is highly debated in both fields, particularly with regards to whether it is useful for the advancement of science and technology. Considerably less attention, however, has been devoted to how the Computational Metaphor is used outside of the lab, and particularly how it may shape society's interactions with AI. As such, recently publicized concerns over AI's role in perpetuating racism, genderism, and ableism suggest that the term "artificial intelligence" is misplaced, and that a new lexicon is needed to describe these computational systems. Thus, there is an essential question about the Computational Metaphor that is rarely asked by neuroscientists: whom does it help and whom does it harm? This essay invites the neuroscience community to consider the social implications of the field's most controversial metaphor.

[32]  arXiv:2107.14044 (cross-list from cs.CY) [pdf, other]
Title: Ethical AI for Social Good
Journal-ref: International Conference on Human-Computer Interaction, 2021
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

The concept of AI for Social Good(AI4SG) is gaining momentum in both information societies and the AI community. Through all the advancement of AI-based solutions, it can solve societal issues effectively. To date, however, there is only a rudimentary grasp of what constitutes AI socially beneficial in principle, what constitutes AI4SG in reality, and what are the policies and regulations needed to ensure it. This paper fills the vacuum by addressing the ethical aspects that are critical for future AI4SG efforts. Some of these characteristics are new to AI, while others have greater importance due to its usage.

[33]  arXiv:2107.14052 (cross-list from cs.CY) [pdf]
Title: The Role of Social Movements, Coalitions, and Workers in Resisting Harmful Artificial Intelligence and Contributing to the Development of Responsible AI
Comments: 184 pages
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); General Economics (econ.GN)

There is mounting public concern over the influence that AI based systems has in our society. Coalitions in all sectors are acting worldwide to resist hamful applications of AI. From indigenous people addressing the lack of reliable data, to smart city stakeholders, to students protesting the academic relationships with sex trafficker and MIT donor Jeffery Epstein, the questionable ethics and values of those heavily investing in and profiting from AI are under global scrutiny. There are biased, wrongful, and disturbing assumptions embedded in AI algorithms that could get locked in without intervention. Our best human judgment is needed to contain AI's harmful impact. Perhaps one of the greatest contributions of AI will be to make us ultimately understand how important human wisdom truly is in life on earth.

[34]  arXiv:2107.14053 (cross-list from cs.LG) [pdf, other]
Title: Few-Shot and Continual Learning with Attentive Independent Mechanisms
Comments: 20 pages, 44 figures, accepted by International Conference of Computer Vision 2021 (ICCV 2021)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks (DNNs) are known to perform well when deployed to test distributions that shares high similarity with the training distribution. Feeding DNNs with new data sequentially that were unseen in the training distribution has two major challenges -- fast adaptation to new tasks and catastrophic forgetting of old tasks. Such difficulties paved way for the on-going research on few-shot learning and continual learning. To tackle these problems, we introduce Attentive Independent Mechanisms (AIM). We incorporate the idea of learning using fast and slow weights in conjunction with the decoupling of the feature extraction and higher-order conceptual learning of a DNN. AIM is designed for higher-order conceptual learning, modeled by a mixture of experts that compete to learn independent concepts to solve a new task. AIM is a modular component that can be inserted into existing deep learning frameworks. We demonstrate its capability for few-shot learning by adding it to SIB and trained on MiniImageNet and CIFAR-FS, showing significant improvement. AIM is also applied to ANML and OML trained on Omniglot, CIFAR-100 and MiniImageNet to demonstrate its capability in continual learning. Code made publicly available at https://github.com/huang50213/AIM-Fewshot-Continual.

[35]  arXiv:2107.14061 (cross-list from cs.CV) [pdf]
Title: The Need and Status of Sea Turtle Conservation and Survey of Associated Computer Vision Advances
Comments: Currently under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

For over hundreds of millions of years, sea turtles and their ancestors have swum in the vast expanses of the ocean. They have undergone a number of evolutionary changes, leading to speciation and sub-speciation. However, in the past few decades, some of the most notable forces driving the genetic variance and population decline have been global warming and anthropogenic impact ranging from large-scale poaching, collecting turtle eggs for food, besides dumping trash including plastic waste into the ocean. This leads to severe detrimental effects in the sea turtle population, driving them to extinction. This research focusses on the forces causing the decline in sea turtle population, the necessity for the global conservation efforts along with its successes and failures, followed by an in-depth analysis of the modern advances in detection and recognition of sea turtles, involving Machine Learning and Computer Vision systems, aiding the conservation efforts.

[36]  arXiv:2107.14062 (cross-list from cs.LG) [pdf, other]
Title: Structure and Performance of Fully Connected Neural Networks: Emerging Complex Network Properties
Comments: 18 pages, 7 figures, and 2 tables. Submitted to a peer-review journal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Applied Physics (physics.app-ph); Computational Physics (physics.comp-ph)

Understanding the behavior of Artificial Neural Networks is one of the main topics in the field recently, as black-box approaches have become usual since the widespread of deep learning. Such high-dimensional models may manifest instabilities and weird properties that resemble complex systems. Therefore, we propose Complex Network (CN) techniques to analyze the structure and performance of fully connected neural networks. For that, we build a dataset with 4 thousand models and their respective CN properties. They are employed in a supervised classification setup considering four vision benchmarks. Each neural network is approached as a weighted and undirected graph of neurons and synapses, and centrality measures are computed after training. Results show that these measures are highly related to the network classification performance. We also propose the concept of Bag-Of-Neurons (BoN), a CN-based approach for finding topological signatures linking similar neurons. Results suggest that six neuronal types emerge in such networks, independently of the target domain, and are distributed differently according to classification accuracy. We also tackle specific CN properties related to performance, such as higher subgraph centrality on lower-performing models. Our findings suggest that CN properties play a critical role in the performance of fully connected neural networks, with topological patterns emerging independently on a wide range of models.

[37]  arXiv:2107.14070 (cross-list from cs.CV) [pdf]
Title: Machine Learning Advances aiding Recognition and Classification of Indian Monuments and Landmarks
Comments: Currently under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Tourism in India plays a quintessential role in the country's economy with an estimated 9.2% GDP share for the year 2018. With a yearly growth rate of 6.2%, the industry holds a huge potential for being the primary driver of the economy as observed in the nations of the Middle East like the United Arab Emirates. The historical and cultural diversity exhibited throughout the geography of the nation is a unique spectacle for people around the world and therefore serves to attract tourists in tens of millions in number every year. Traditionally, tour guides or academic professionals who study these heritage monuments were responsible for providing information to the visitors regarding their architectural and historical significance. However, unfortunately this system has several caveats when considered on a large scale such as unavailability of sufficient trained people, lack of accurate information, failure to convey the richness of details in an attractive format etc. Recently, machine learning approaches revolving around the usage of monument pictures have been shown to be useful for rudimentary analysis of heritage sights. This paper serves as a survey of the research endeavors undertaken in this direction which would eventually provide insights for building an automated decision system that could be utilized to make the experience of tourism in India more modernized for visitors.

[38]  arXiv:2107.14077 (cross-list from cs.CY) [pdf, other]
Title: A Fair and Ethical Healthcare Artificial Intelligence System for Monitoring Driver Behavior and Preventing Road Accidents
Comments: 12 pages, 2 figures, accepted to Future Technologies Conference (FTC 2021)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

This paper presents a new approach to prevent transportation accidents and monitor driver's behavior using a healthcare AI system that incorporates fairness and ethics. Dangerous medical cases and unusual behavior of the driver are detected. Fairness algorithm is approached in order to improve decision-making and address ethical issues such as privacy issues, and to consider challenges that appear in the wild within AI in healthcare and driving. A healthcare professional will be alerted about any unusual activity, and the driver's location when necessary, is provided in order to enable the healthcare professional to immediately help to the unstable driver. Therefore, using the healthcare AI system allows for accidents to be predicted and thus prevented and lives may be saved based on the built-in AI system inside the vehicle which interacts with the ER system.

[39]  arXiv:2107.14093 (cross-list from cs.CY) [pdf, other]
Title: A Decision Model for Decentralized Autonomous Organization Platform Selection: Three Industry Case Studies
Authors: Elena Baninemeh (1), Siamak Farshidi (2), Slinger Jansen (1) ((1) Department of Information and Computer Science at Utrecht University, Utrecht, the Netherlands, (2) Informatics Institute at University of Amsterdam, Amsterdam, the Netherlands)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Decentralized autonomous organizations as a new form of online governance arecollections of smart contracts deployed on a blockchain platform that intercede groupsof people. A growing number of Decentralized Autonomous Organization Platforms,such as Aragon and Colony, have been introduced in the market to facilitate thedevelopment process of such organizations. Selecting the best fitting platform ischallenging for the organizations, as a significant number of decision criteria, such aspopularity, developer availability, governance issues, and consistent documentation ofsuch platforms, should be considered. Additionally, decision-makers at theorganizations are not experts in every domain, so they must continuously acquirevolatile knowledge regarding such platforms and keep themselves updated.Accordingly, a decision model is required to analyze the decision criteria usingsystematic identification and evaluation of potential alternative solutions for adevelopment project. We have developed a theoretical framework to assist softwareengineers with a set of Multi-Criteria Decision-Making problems in software production.This study presents a decision model as a Multi-Criteria Decision-Making problem forthe decentralized autonomous organization platform selection problem. Weconducted three industry case studies in the context of three decentralizedautonomous organizations to evaluate the effectiveness and efficiency of the decisionmodel in assisting decision-makers.

[40]  arXiv:2107.14203 (cross-list from stat.ML) [pdf, other]
Title: Did the Model Change? Efficiently Assessing Machine Learning API Shifts
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applications (stat.AP)

Machine learning (ML) prediction APIs are increasingly widely used. An ML API can change over time due to model updates or retraining. This presents a key challenge in the usage of the API because it is often not clear to the user if and how the ML model has changed. Model shifts can affect downstream application performance and also create oversight issues (e.g. if consistency is desired). In this paper, we initiate a systematic investigation of ML API shifts. We first quantify the performance shifts from 2020 to 2021 of popular ML APIs from Google, Microsoft, Amazon, and others on a variety of datasets. We identified significant model shifts in 12 out of 36 cases we investigated. Interestingly, we found several datasets where the API's predictions became significantly worse over time. This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant. Monitoring confusion matrix shifts using standard random sampling can require a large number of samples, which is expensive as each API call costs a fee. We propose a principled adaptive sampling algorithm, MASA, to efficiently estimate confusion matrix shifts. MASA can accurately estimate the confusion matrix shifts in commercial ML APIs using up to 90% fewer samples compared to random sampling. This work establishes ML API shifts as an important problem to study and provides a cost-effective approach to monitor such shifts.

[41]  arXiv:2107.14226 (cross-list from cs.LG) [pdf, other]
Title: Learning more skills through optimistic exploration
Comments: Steven Hansen and DJ Strouse contributed equally to this work
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Unsupervised skill learning objectives (Gregor et al., 2016, Eysenbach et al., 2018) allow agents to learn rich repertoires of behavior in the absence of extrinsic rewards. They work by simultaneously training a policy to produce distinguishable latent-conditioned trajectories, and a discriminator to evaluate distinguishability by trying to infer latents from trajectories. The hope is for the agent to explore and master the environment by encouraging each skill (latent) to reliably reach different states. However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective. To combat this inherent pessimism towards exploration, we derive an information gain auxiliary objective that involves training an ensemble of discriminators and rewarding the policy for their disagreement. Our objective directly estimates the epistemic uncertainty that comes from the discriminator not having seen enough training examples, thus providing an intrinsic reward more tailored to the true objective compared to pseudocount-based methods (Burda et al., 2019). We call this exploration bonus discriminator disagreement intrinsic reward, or DISDAIN. We demonstrate empirically that DISDAIN improves skill learning both in a tabular grid world (Four Rooms) and the 57 games of the Atari Suite (from pixels). Thus, we encourage researchers to treat pessimism with DISDAIN.

[42]  arXiv:2107.14229 (cross-list from cs.CV) [pdf, other]
Title: Guided Disentanglement in Generative Networks
Comments: Journal submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

Image-to-image translation (i2i) networks suffer from entanglement effects in presence of physics-related phenomena in target domain (such as occlusions, fog, etc), thus lowering the translation quality and variability. In this paper, we present a comprehensive method for disentangling physics-based traits in the translation, guiding the learning process with neural or physical models. For the latter, we integrate adversarial estimation and genetic algorithms to correctly achieve disentanglement. The results show our approach dramatically increase performances in many challenging scenarios for image translation.

Replacements for Fri, 30 Jul 21

[43]  arXiv:2009.11485 (replaced) [src]
Title: CogniFNN: A Fuzzy Neural Network Framework for Cognitive Word Embedding Evaluation
Comments: The method and results need to be further investigated
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[44]  arXiv:2101.04640 (replaced) [pdf, other]
Title: Dimensions of Commonsense Knowledge
Journal-ref: Knowledge-Based Systems 2021
Subjects: Artificial Intelligence (cs.AI)
[45]  arXiv:2010.11270 (replaced) [pdf, other]
Title: Learning second order coupled differential equations that are subject to non-conservative forces
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[46]  arXiv:2012.13349 (replaced) [pdf, other]
Title: Solving Mixed Integer Programs Using Neural Networks
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[47]  arXiv:2012.14173 (replaced) [pdf, other]
Title: Playing to distraction: towards a robust training of CNN classifiers through visual explanation techniques
Comments: 20 pages,3 figures, 4 tables
Journal-ref: Neural Comput & Applic (2021)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[48]  arXiv:2012.15843 (replaced) [pdf, other]
Title: A Tale of Two Efficient and Informative Negative Sampling Distributions
Comments: Published at ICML 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)
[49]  arXiv:2101.00591 (replaced) [pdf, other]
Title: Progressive Correspondence Pruning by Consensus Learning
Comments: Accepted by ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[50]  arXiv:2102.12855 (replaced) [pdf, other]
Title: Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic
Comments: arXiv admin note: text overlap with arXiv:2010.06797
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
[51]  arXiv:2104.14222 (replaced) [pdf, other]
Title: Privacy-Preserving Portrait Matting
Comments: Accepted to ACM Multimedia 2021, code and dataset available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[52]  arXiv:2105.10719 (replaced) [pdf, other]
Title: Learning Baseline Values for Shapley Values
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[53]  arXiv:2105.11283 (replaced) [pdf, other]
Title: Coarse-to-Fine for Sim-to-Real: Sub-Millimetre Precision Across Wide Task Spaces
Comments: To be published at IROS 2021. 8 pages, 6 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[54]  arXiv:2106.04148 (replaced) [pdf, other]
Title: RECOWNs: Probabilistic Circuits for Trustworthy Time Series Forecasting
Comments: Accepted for the 4th Workshop on Tractable Probabilistic Modeling (TPM 2021)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[55]  arXiv:2106.07832 (replaced) [pdf, other]
Title: Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[56]  arXiv:2107.00821 (replaced) [pdf, other]
Title: An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors
Comments: Technical Report
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[57]  arXiv:2107.08176 (replaced) [pdf, other]
Title: Automatic Fairness Testing of Neural Classifiers through Adversarial Sampling
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[58]  arXiv:2107.11817 (replaced) [pdf, other]
Title: Go Wider Instead of Deeper
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[ total of 58 entries: 1-58 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2107, contact, help  (Access key information)