We gratefully acknowledge support from
the Simons Foundation and member institutions.

Artificial Intelligence

New submissions

[ total of 102 entries: 1-102 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 8 Feb 23

[1]  arXiv:2302.03180 [pdf, ps, other]
Title: Who wants what and how: a Mapping Function for Explainable Artificial Intelligence
Authors: Maryam Hashemi
Subjects: Artificial Intelligence (cs.AI)

The increasing complexity of AI systems has led to the growth of the field of explainable AI (XAI), which aims to provide explanations and justifications for the outputs of AI algorithms. These methods mainly focus on feature importance and identifying changes that can be made to achieve a desired outcome. Researchers have identified desired properties for XAI methods, such as plausibility, sparsity, causality, low run-time, etc. The objective of this study is to conduct a review of existing XAI research and present a classification of XAI methods. The study also aims to connect XAI users with the appropriate method and relate desired properties to current XAI approaches. The outcome of this study will be a clear strategy that outlines how to choose the right XAI method for a particular goal and user and provide a personalized explanation for users.

[2]  arXiv:2302.03189 [pdf, ps, other]
Title: Emergent Causality & the Foundation of Consciousness
Subjects: Artificial Intelligence (cs.AI)

To make accurate inferences in an interactive setting, an agent must not confuse passive observation of events with having participated in causing those events. The do operator formalises interventions so that we may reason about their effect. Yet there exist at least two pareto optimal mathematical formalisms of general intelligence in an interactive setting which, presupposing no explicit representation of intervention, make maximally accurate inferences. We examine one such formalism. We show that in the absence of an operator, an intervention can still be represented by a variable. Furthermore, the need to explicitly represent interventions in advance arises only because we presuppose abstractions. The aforementioned formalism avoids this and so, initial conditions permitting, representations of relevant causal interventions will emerge through induction. These emergent abstractions function as representations of one`s self and of any other object, inasmuch as the interventions of those objects impact the satisfaction of goals. We argue (with reference to theory of mind) that this explains how one might reason about one`s own identity and intent, those of others, of one's own as perceived by others and so on. In a narrow sense this describes what it is to be aware, and is a mechanistic explanation of aspects of consciousness.

[3]  arXiv:2302.03352 [pdf, ps, other]
Title: Towards Understanding the Effects of Evolving the MCTS UCT Selection Policy
Comments: 8 pages, double column, 6 figures, 1 table, conference. arXiv admin note: substantial text overlap with arXiv:2208.13589, arXiv:2112.09697
Subjects: Artificial Intelligence (cs.AI)

Monte Carlo Tree Search (MCTS) is a sampling best-first method to search for optimal decisions. The success of MCTS depends heavily on how the MCTS statistical tree is built and the selection policy plays a fundamental role in this. A particular selection policy that works particularly well, widely adopted in MCTS, is the Upper Confidence Bounds for Trees, referred to as UCT. Other more sophisticated bounds have been proposed by the community with the goal to improve MCTS performance on particular problems. Thus, it is evident that while the MCTS UCT behaves generally well, some variants might behave better. As a result of this, multiple works have been proposed to evolve a selection policy to be used in MCTS. Although all these works are inspiring, none of them have carried out an in-depth analysis shedding light under what circumstances an evolved alternative of MCTS UCT might be beneficial in MCTS due to focusing on a single type of problem. In sharp contrast to this, in this work we use five functions of different nature, going from a unimodal function, covering multimodal functions to deceptive functions. We demonstrate how the evolution of the MCTS UCT might be beneficial in multimodal and deceptive scenarios, whereas the MCTS UCT is robust in unimodal scenarios and competitive in the rest of the scenarios used in this study.

[4]  arXiv:2302.03384 [pdf, ps, other]
Title: Act for Your Duties but Maintain Your Rights
Journal-ref: International Conference on Principles of Knowledge Representation and Reasoning (KR), 2022
Subjects: Artificial Intelligence (cs.AI)

Most of the synthesis literature has focused on studying how to synthesize a strategy to fulfill a task. This task is a duty for the agent. In this paper, we argue that intelligent agents should also be equipped with rights, that is, tasks that the agent itself can choose to fulfill (e.g., the right of recharging the battery). The agent should be able to maintain these rights while acting for its duties. We study this issue in the context of LTLf synthesis: we give duties and rights in terms of LTLf specifications, and synthesize a suitable strategy to achieve the duties that can be modified on-the-fly to achieve also the rights, if the agent chooses to do so. We show that handling rights does not make synthesis substantially more difficult, although it requires a more sophisticated solution concept than standard LTLf synthesis. We also extend our results to the case in which further duties and rights are given to the agent while already executing.

[5]  arXiv:2302.03429 [pdf, other]
Title: Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Recent advances in multi-agent reinforcement learning (MARL) allow agents to coordinate their behaviors in complex environments. However, common MARL algorithms still suffer from scalability and sparse reward issues. One promising approach to resolving them is automatic curriculum learning (ACL). ACL involves a student (curriculum learner) training on tasks of increasing difficulty controlled by a teacher (curriculum generator). Despite its success, ACL's applicability is limited by (1) the lack of a general student framework for dealing with the varying number of agents across tasks and the sparse reward problem, and (2) the non-stationarity of the teacher's task due to ever-changing student strategies. As a remedy for ACL, we introduce a novel automatic curriculum learning framework, Skilled Population Curriculum (SPC), which adapts curriculum learning to multi-agent coordination. Specifically, we endow the student with population-invariant communication and a hierarchical skill set, allowing it to learn cooperation and behavior skills from distinct tasks with varying numbers of agents. In addition, we model the teacher as a contextual bandit conditioned by student policies, enabling a team of agents to change its size while still retaining previously acquired skills. We also analyze the inherent non-stationarity of this multi-agent automatic curriculum teaching problem and provide a corresponding regret bound. Empirical results show that our method improves the performance, scalability and sample efficiency in several MARL environments.

[6]  arXiv:2302.03578 [pdf, other]
Title: Towards a Deeper Understanding of Concept Bottleneck Models Through End-to-End Explanation
Comments: Accepted into the AAAI-23 workshop Representation Learning for Responsible Human-Centric AI (R2HCAI) as a 4 page paper. This version also includes an additional 47 pages for the appendix and contains additional figures and tables
Subjects: Artificial Intelligence (cs.AI)

Concept Bottleneck Models (CBMs) first map raw input(s) to a vector of human-defined concepts, before using this vector to predict a final classification. We might therefore expect CBMs capable of predicting concepts based on distinct regions of an input. In doing so, this would support human interpretation when generating explanations of the model's outputs to visualise input features corresponding to concepts. The contribution of this paper is threefold: Firstly, we expand on existing literature by looking at relevance both from the input to the concept vector, confirming that relevance is distributed among the input features, and from the concept vector to the final classification where, for the most part, the final classification is made using concepts predicted as present. Secondly, we report a quantitative evaluation to measure the distance between the maximum input feature relevance and the ground truth location; we perform this with the techniques, Layer-wise Relevance Propagation (LRP), Integrated Gradients (IG) and a baseline gradient approach, finding LRP has a lower average distance than IG. Thirdly, we propose using the proportion of relevance as a measurement for explaining concept importance.

[7]  arXiv:2302.03625 [pdf]
Title: An Expert System to Diagnose Spinal Disorders
Subjects: Artificial Intelligence (cs.AI)

Objective: Until now, traditional invasive approaches have been the only means being leveraged to diagnose spinal disorders. Traditional manual diagnostics require a high workload, and diagnostic errors are likely to occur due to the prolonged work of physicians. In this research, we develop an expert system based on a hybrid inference algorithm and comprehensive integrated knowledge for assisting the experts in the fast and high-quality diagnosis of spinal disorders.
Methods: First, for each spinal anomaly, the accurate and integrated knowledge was acquired from related experts and resources. Second, based on probability distributions and dependencies between symptoms of each anomaly, a unique numerical value known as certainty effect value was assigned to each symptom. Third, a new hybrid inference algorithm was designed to obtain excellent performance, which was an incorporation of the Backward Chaining Inference and Theory of Uncertainty.
Results: The proposed expert system was evaluated in two different phases, real-world samples, and medical records evaluation. Evaluations show that in terms of real-world samples analysis, the system achieved excellent accuracy. Application of the system on the sample with anomalies revealed the degree of severity of disorders and the risk of development of abnormalities in unhealthy and healthy patients. In the case of medical records analysis, our expert system proved to have promising performance, which was very close to those of experts.
Conclusion: Evaluations suggest that the proposed expert system provides promising performance, helping specialists to validate the accuracy and integrity of their diagnosis. It can also serve as an intelligent educational software for medical students to gain familiarity with spinal disorder diagnosis process, and related symptoms.

Cross-lists for Wed, 8 Feb 23

[8]  arXiv:2302.01876 (cross-list from cs.AR) [pdf, other]
Title: PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications
Comments: Accepted by 2023 IEEE International Symposium on Circuits and Systems
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Posit has been a promising alternative to the IEEE-754 floating point format for deep learning applications due to its better trade-off between dynamic range and accuracy. However, hardware implementation of posit arithmetic requires further exploration, especially for the dot-product operations dominated in deep neural networks (DNNs). It has been implemented by either the combination of multipliers and an adder tree or cascaded fused multiply-add units, leading to poor computational efficiency and excessive hardware overhead. To address this issue, we propose an open-source posit dot-product unit, namely PDPU, that facilitates resource-efficient and high-throughput dot-product hardware implementation. PDPU not only features the fused and mixed-precision architecture that eliminates redundant latency and hardware resources, but also has a fine-grained 6-stage pipeline, improving computational efficiency. A configurable PDPU generator is further developed to meet the diverse needs of various DNNs for computational accuracy. Experimental results evaluated under the 28nm CMOS process show that PDPU reduces area, latency, and power by up to 43%, 64%, and 70%, respectively, compared to the existing implementations. Hence, PDPU has great potential as the computing core of posit-based accelerators for deep learning applications.

[9]  arXiv:2302.03033 (cross-list from eess.IV) [pdf, other]
Title: Exemplars and Counterexemplars Explanations for Image Classifiers, Targeting Skin Lesion Labeling
Comments: arXiv admin note: text overlap with arXiv:2111.11863
Journal-ref: 2021 IEEE Symposium on Computers and Communications (ISCC)
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Explainable AI consists in developing mechanisms allowing for an interaction between decision systems and humans by making the decisions of the formers understandable. This is particularly important in sensitive contexts like in the medical domain. We propose a use case study, for skin lesion diagnosis, illustrating how it is possible to provide the practitioner with explanations on the decisions of a state of the art deep neural network classifier trained to characterize skin lesions from examples. Our framework consists of a trained classifier onto which an explanation module operates. The latter is able to offer the practitioner exemplars and counterexemplars for the classification diagnosis thus allowing the physician to interact with the automatic diagnosis system. The exemplars are generated via an adversarial autoencoder. We illustrate the behavior of the system on representative examples.

[10]  arXiv:2302.03036 (cross-list from cs.CL) [pdf]
Title: Witscript 2: A System for Generating Improvised Jokes Without Wordplay
Authors: Joe Toplyn
Comments: 5 pages. Published in the Proceedings of the 13th International Conference on Computational Creativity (ICCC 2022), pages 54-58. arXiv admin note: text overlap with arXiv:2301.02695. substantial text overlap with arXiv:2302.02008
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

A previous paper presented Witscript, a system for generating conversational jokes that rely on wordplay. This paper extends that work by presenting Witscript 2, which uses a large language model to generate conversational jokes that rely on common sense instead of wordplay. Like Witscript, Witscript 2 is based on joke-writing algorithms created by an expert comedy writer. Human evaluators judged Witscript 2's responses to input sentences to be jokes 46% of the time, compared to 70% of the time for human-written responses. This is evidence that Witscript 2 represents another step toward giving a chatbot a humanlike sense of humor.

[11]  arXiv:2302.03037 (cross-list from cs.HC) [pdf, other]
Title: LiteVR: Interpretable and Lightweight Cybersickness Detection using Explainable AI
Comments: Accepted for publication in IEEE VR 2023 conference. arXiv admin note: substantial text overlap with arXiv:2302.01985
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cybersickness is a common ailment associated with virtual reality (VR) user experiences. Several automated methods exist based on machine learning (ML) and deep learning (DL) to detect cybersickness. However, most of these cybersickness detection methods are perceived as computationally intensive and black-box methods. Thus, those techniques are neither trustworthy nor practical for deploying on standalone energy-constrained VR head-mounted devices (HMDs). In this work, we present an explainable artificial intelligence (XAI)-based framework, LiteVR, for cybersickness detection, explaining the model's outcome and reducing the feature dimensions and overall computational costs. First, we develop three cybersickness DL models based on long-term short-term memory (LSTM), gated recurrent unit (GRU), and multilayer perceptron (MLP). Then, we employed a post-hoc explanation, such as SHapley Additive Explanations (SHAP), to explain the results and extract the most dominant features of cybersickness. Finally, we retrain the DL models with the reduced number of features. Our results show that eye-tracking features are the most dominant for cybersickness detection. Furthermore, based on the XAI-based feature ranking and dimensionality reduction, we significantly reduce the model's size by up to 4.3x, training time by up to 5.6x, and its inference time by up to 3.8x, with higher cybersickness detection accuracy and low regression error (i.e., on Fast Motion Scale (FMS)). Our proposed lite LSTM model obtained an accuracy of 94% in classifying cybersickness and regressing (i.e., FMS 1-10) with a Root Mean Square Error (RMSE) of 0.30, which outperforms the state-of-the-art. Our proposed LiteVR framework can help researchers and practitioners analyze, detect, and deploy their DL-based cybersickness detection models in standalone VR HMDs.

[12]  arXiv:2302.03038 (cross-list from q-bio.GN) [pdf, other]
Title: Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic Data Imputation
Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Spatially resolved transcriptomics brings exciting breakthroughs to single-cell analysis by providing physical locations along with gene expression. However, as a cost of the extremely high spatial resolution, the cellular level spatial transcriptomic data suffer significantly from missing values. While a standard solution is to perform imputation on the missing values, most existing methods either overlook spatial information or only incorporate localized spatial context without the ability to capture long-range spatial information. Using multi-head self-attention mechanisms and positional encoding, transformer models can readily grasp the relationship between tokens and encode location information. In this paper, by treating single cells as spatial tokens, we study how to leverage transformers to facilitate spatial tanscriptomics imputation. In particular, investigate the following two key questions: (1) $\textit{how to encode spatial information of cells in transformers}$, and (2) $\textit{ how to train a transformer for transcriptomic imputation}$. By answering these two questions, we present a transformer-based imputation framework, SpaFormer, for cellular-level spatial transcriptomic data. Extensive experiments demonstrate that SpaFormer outperforms existing state-of-the-art imputation algorithms on three large-scale datasets.

[13]  arXiv:2302.03064 (cross-list from cs.CV) [pdf, other]
Title: Investigating Pulse-Echo Sound Speed Estimation in Breast Ultrasound with Deep Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Ultrasound is an adjunct tool to mammography that can quickly and safely aid physicians with diagnosing breast abnormalities. Clinical ultrasound often assumes a constant sound speed to form B-mode images for diagnosis. However, the various types of breast tissue, such as glandular, fat, and lesions, differ in sound speed. These differences can degrade the image reconstruction process. Alternatively, sound speed can be a powerful tool for identifying disease. To this end, we propose a deep-learning approach for sound speed estimation from in-phase and quadrature ultrasound signals. First, we develop a large-scale simulated ultrasound dataset that generates quasi-realistic breast tissue by modeling breast gland, skin, and lesions with varying echogenicity and sound speed. We developed a fully convolutional neural network architecture trained on a simulated dataset to produce an estimated sound speed map from inputting three complex-value in-phase and quadrature ultrasound images formed from plane-wave transmissions at separate angles. Furthermore, thermal noise augmentation is used during model optimization to enhance generalizability to real ultrasound data. We evaluate the model on simulated, phantom, and in-vivo breast ultrasound data, demonstrating its ability to accurately estimate sound speeds consistent with previously reported values in the literature. Our simulated dataset and model will be publicly available to provide a step towards accurate and generalizable sound speed estimation for pulse-echo ultrasound imaging.

[14]  arXiv:2302.03067 (cross-list from cs.LG) [pdf, other]
Title: Memory-Based Meta-Learning on Non-Stationary Distributions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Memory-based meta-learning is a technique for approximating Bayes-optimal predictors. Under fairly general conditions, minimizing sequential prediction error, measured by the log loss, leads to implicit meta-learning. The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. The focus is on piecewise stationary sources with unobserved switching-points, which arguably capture an important characteristic of natural language and action-observation sequences in partially observable environments. We show that various types of memory-based neural models, including Transformers, LSTMs, and RNNs can learn to accurately approximate known Bayes-optimal algorithms and behave as if performing Bayesian inference over the latent switching-points and the latent parameters governing the data distribution within each segment.

[15]  arXiv:2302.03068 (cross-list from cs.LG) [pdf, other]
Title: Evaluating Self-Supervised Learning via Risk Decomposition
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Self-supervised learning (SSL) pipelines differ in many design choices such as the architecture, augmentations, or pretraining data. Yet SSL is typically evaluated using a single metric: linear probing on ImageNet. This does not provide much insight into why or when a model is better, now how to improve it. To address this, we propose an SSL risk decomposition, which generalizes the classical supervised approximation-estimation decomposition by considering errors arising from the representation learning step. Our decomposition consists of four error components: approximation, representation usability, probe generalization, and encoder generalization. We provide efficient estimators for each component and use them to analyze the effect of 30 design choices on 169 SSL vision models evaluated on ImageNet. Our analysis gives valuable insights for designing and using SSL models. For example, it highlights the main sources of error and shows how to improve SSL in specific settings (full- vs few-shot) by trading off error components. All results and pretrained models are at https://github.com/YannDubs/SSL-Risk-Decomposition.

[16]  arXiv:2302.03086 (cross-list from cs.LG) [pdf, other]
Title: DITTO: Offline Imitation Learning with World Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We propose DITTO, an offline imitation learning algorithm which uses world models and on-policy reinforcement learning to addresses the problem of covariate shift, without access to an oracle or any additional online interactions. We discuss how world models enable offline, on-policy imitation learning, and propose a simple intrinsic reward defined in the world model latent space that induces imitation learning by reinforcement learning. Theoretically, we show that our formulation induces a divergence bound between expert and learner, in turn bounding the difference in reward. We test our method on difficult Atari environments from pixels alone, and achieve state-of-the-art performance in the offline setting.

[17]  arXiv:2302.03087 (cross-list from cs.GT) [pdf, ps, other]
Title: Dividing Good and Better Items Among Agents with Submodular Valuations
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

We study the problem of fairly allocating a set of indivisible goods among agents with bivalued submodular valuations -- each good provides a marginal gain of either $a$ or $b$ ($a < b$) and goods have decreasing marginal gains. This is a natural generalization of two well-studied valuation classes -- bivalued additive valuations and binary submodular valuations. We present a simple sequential algorithmic framework, based on the recently introduced Yankee Swap mechanism, that can be adapted to compute a variety of solution concepts, including leximin, max Nash welfare (MNW) and $p$-mean welfare maximizing allocations when $a$ divides $b$. This result is complemented by an existing result on the computational intractability of leximin and MNW allocations when $a$ does not divide $b$.
We further examine leximin and MNW allocations with respect to two well-known properties -- envy freeness and the maximin share guarantee. On envy freeness, we show that neither the leximin nor the MNW allocation is guaranteed to be envy free up to one good (EF1). This is surprising since for the simpler classes of bivalued additive valuations and binary submodular valuations, MNW allocations are known to be envy free up to any good (EFX). On the maximin share guarantee, we show that MNW and leximin allocations guarantee each agent $\frac14$ and $\frac{a}{b+3a}$ of their maximin share respectively when $a$ divides $b$. This fraction improves to $\frac13$ and $\frac{a}{b+2a}$ respectively when agents have bivalued additive valuations.

[18]  arXiv:2302.03122 (cross-list from cs.LG) [pdf, other]
Title: State-wise Safe Reinforcement Learning: A Survey
Comments: 9 pages, 1 figure
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address state-wise constraints in RL. Under the framework of State-wise Constrained Markov Decision Process (SCMDP), we will discuss the connections, differences, and trade-offs of existing approaches in terms of (i) safety guarantee and scalability, (ii) safety and reward performance, and (iii) safety after convergence and during training. We also summarize limitations of current methods and discuss potential future directions.

[19]  arXiv:2302.03133 (cross-list from cs.LG) [pdf, other]
Title: Domain Adaptation for Time Series Under Feature and Label Shifts
Comments: 24 pages (13 pages main paper + 11 pages supplementary materials)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The transfer of models trained on labeled datasets in a source domain to unlabeled target domains is made possible by unsupervised domain adaptation (UDA). However, when dealing with complex time series models, the transferability becomes challenging due to the dynamic temporal structure that varies between domains, resulting in feature shifts and gaps in the time and frequency representations. Furthermore, tasks in the source and target domains can have vastly different label distributions, making it difficult for UDA to mitigate label shifts and recognize labels that only exist in the target domain. We present RAINCOAT, the first model for both closed-set and universal DA on complex time series. RAINCOAT addresses feature and label shifts by considering both temporal and frequency features, aligning them across domains, and correcting for misalignments to facilitate the detection of private labels. Additionally,RAINCOAT improves transferability by identifying label shifts in target domains. Our experiments with 5 datasets and 13 state-of-the-art UDA methods demonstrate that RAINCOAT can achieve an improvement in performance of up to 16.33%, and can effectively handle both closed-set and universal adaptation.

[20]  arXiv:2302.03156 (cross-list from cs.CV) [pdf, other]
Title: Novel Building Detection and Location Intelligence Collection in Aerial Satellite Imagery
Comments: 9 pages(5 main pages, 4 auxiliary pages)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Building structures detection and information about these buildings in aerial images is an important solution for city planning and management, land use analysis. It can be the center piece to answer important questions such as planning evacuation routes in case of an earthquake, flood management, etc. These applications rely on being able to accurately retrieve up-to-date information. Being able to accurately detect buildings in a bounding box centered on a specific latitude-longitude value can help greatly. The key challenge is to be able to detect buildings which can be commercial, industrial, hut settlements, or skyscrapers. Once we are able to detect such buildings, our goal will be to cluster and categorize similar types of buildings together.

[21]  arXiv:2302.03173 (cross-list from physics.ao-ph) [pdf, other]
Title: Learning bias corrections for climate models using deep neural operators
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)

Numerical simulation for climate modeling resolving all important scales is a computationally taxing process. Therefore, to circumvent this issue a low resolution simulation is performed, which is subsequently corrected for bias using reanalyzed data (ERA5), known as nudging correction. The existing implementation for nudging correction uses a relaxation based method for the algebraic difference between low resolution and ERA5 data. In this study, we replace the bias correction process with a surrogate model based on the Deep Operator Network (DeepONet). DeepONet (Deep Operator Neural Network) learns the mapping from the state before nudging (a functional) to the nudging tendency (another functional). The nudging tendency is a very high dimensional data albeit having many low energy modes. Therefore, the DeepoNet is combined with a convolution based auto-encoder-decoder (AED) architecture in order to learn the nudging tendency in a lower dimensional latent space efficiently. The accuracy of the DeepONet model is tested against the nudging tendency obtained from the E3SMv2 (Energy Exascale Earth System Model) and shows good agreement. The overarching goal of this work is to deploy the DeepONet model in an online setting and replace the nudging module in the E3SM loop for better efficiency and accuracy.

[22]  arXiv:2302.03221 (cross-list from cs.IR) [pdf, ps, other]
Title: Towards Lightweight Cross-domain Sequential Recommendation via External Attention-enhanced Graph Convolution Network
Comments: 16 pages, 4 figures, conference paper, accepted by DASFAA 2023
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cross-domain Sequential Recommendation (CSR) is an emerging yet challenging task that depicts the evolution of behavior patterns for overlapped users by modeling their interactions from multiple domains. Existing studies on CSR mainly focus on using composite or in-depth structures that achieve significant improvement in accuracy but bring a huge burden to the model training. Moreover, to learn the user-specific sequence representations, existing works usually adopt the global relevance weighting strategy (e.g., self-attention mechanism), which has quadratic computational complexity. In this work, we introduce a lightweight external attention-enhanced GCN-based framework to solve the above challenges, namely LEA-GCN. Specifically, by only keeping the neighborhood aggregation component and using the Single-Layer Aggregating Protocol (SLAP), our lightweight GCN encoder performs more efficiently to capture the collaborative filtering signals of the items from both domains. To further alleviate the framework structure and aggregate the user-specific sequential pattern, we devise a novel dual-channel External Attention (EA) component, which calculates the correlation among all items via a lightweight linear structure. Extensive experiments are conducted on two real-world datasets, demonstrating that LEA-GCN requires a smaller volume and less training time without affecting the accuracy compared with several state-of-the-art methods.

[23]  arXiv:2302.03225 (cross-list from cs.LG) [pdf, other]
Title: Efficient XAI Techniques: A Taxonomic Survey
Comments: 14 pages, 3 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Recently, there has been a growing demand for the deployment of Explainable Artificial Intelligence (XAI) algorithms in real-world applications. However, traditional XAI methods typically suffer from a high computational complexity problem, which discourages the deployment of real-time systems to meet the time-demanding requirements of real-world scenarios. Although many approaches have been proposed to improve the efficiency of XAI methods, a comprehensive understanding of the achievements and challenges is still needed. To this end, in this paper we provide a review of efficient XAI. Specifically, we categorize existing techniques of XAI acceleration into efficient non-amortized and efficient amortized methods. The efficient non-amortized methods focus on data-centric or model-centric acceleration upon each individual instance. In contrast, amortized methods focus on learning a unified distribution of model explanations, following the predictive, generative, or reinforcement frameworks, to rapidly derive multiple model explanations. We also analyze the limitations of an efficient XAI pipeline from the perspectives of the training phase, the deployment phase, and the use scenarios. Finally, we summarize the challenges of deploying XAI acceleration methods to real-world scenarios, overcoming the trade-off between faithfulness and efficiency, and the selection of different acceleration methods.

[24]  arXiv:2302.03235 (cross-list from cs.NE) [pdf, other]
Title: Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs
Comments: Published as a conference paper at ICLR 2023
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Rapidly learning from ongoing experiences and remembering past events with a flexible memory system are two core capacities of biological intelligence. While the underlying neural mechanisms are not fully understood, various evidence supports that synaptic plasticity plays a critical role in memory formation and fast learning. Inspired by these results, we equip Recurrent Neural Networks (RNNs) with plasticity rules to enable them to adapt their parameters according to ongoing experiences. In addition to the traditional local Hebbian plasticity, we propose a global, gradient-based plasticity rule, which allows the model to evolve towards its self-determined target. Our models show promising results on sequential and associative memory tasks, illustrating their ability to robustly form and retain memories. In the meantime, these models can cope with many challenging few-shot learning problems. Comparing different plasticity rules under the same framework shows that Hebbian plasticity is well-suited for several memory and associative learning tasks; however, it is outperformed by gradient-based plasticity on few-shot regression tasks which require the model to infer the underlying mapping. Code is available at https://github.com/yuvenduan/PlasticRNNs.

[25]  arXiv:2302.03241 (cross-list from cs.CL) [pdf, other]
Title: Continual Learning of Language Models
Comments: ICLR 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Language models (LMs) have been instrumental for the rapid advance of natural language processing. This paper studies continual learning of LMs, in particular, continual domain-adaptive pre-training (or continual DAP-training). Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances. The key novelty of our method is a soft-masking mechanism that directly controls the update to the LM. A novel proxy is also proposed to preserve the general knowledge in the original LM. Additionally, it contrasts the representations of the previously learned domain knowledge (including the general knowledge in the pre-trained LM) and the knowledge from the current full network to achieve knowledge integration. The method not only overcomes catastrophic forgetting, but also achieves knowledge transfer to improve end-task performances. Empirical evaluation demonstrates the effectiveness of the proposed method.

[26]  arXiv:2302.03246 (cross-list from cs.LG) [pdf, other]
Title: CDANs: Temporal Causal Discovery from Autocorrelated and Non-Stationary Time Series Data
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)

This study presents a novel constraint-based causal discovery approach for autocorrelated and non-stationary time series data (CDANs). Our proposed method addresses several limitations of existing causal discovery methods for autocorrelated and non-stationary time series data, such as high dimensionality, the inability to identify lagged causal relationships, and the overlook of changing modules. Our approach identifies both lagged and instantaneous/contemporaneous causal relationships along with changing modules that vary over time. The method optimizes the conditioning sets in a constraint-based search by considering lagged parents instead of conditioning on the entire past that addresses high dimensionality. The changing modules are detected by considering both contemporaneous and lagged parents. The approach first detects the lagged adjacencies, then identifies the changing modules and contemporaneous adjacencies, and finally determines the causal direction. We extensively evaluated the proposed method using synthetic datasets and a real-world clinical dataset and compared its performance with several baseline approaches. The results demonstrate the effectiveness of the proposed method in detecting causal relationships and changing modules in autocorrelated and non-stationary time series data.

[27]  arXiv:2302.03269 (cross-list from cs.CL) [pdf, other]
Title: PLACES: Prompting Language Models for Social Conversation Synthesis
Comments: In EACL 2023. 25 pages, 4 figures, 26 tables. Link to code forthcoming
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Collecting high quality conversational data can be very expensive for most applications and infeasible for others due to privacy, ethical, or similar concerns. A promising direction to tackle this problem is to generate synthetic dialogues by prompting large language models. In this work, we use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations. This includes various dimensions of conversation quality with human evaluation directly on the synthesized conversations, and interactive human evaluation of chatbots fine-tuned on the synthetically generated dataset. We additionally demonstrate that this prompting approach is generalizable to multi-party conversations, providing potential to create new synthetic data for multi-party tasks. Our synthetic multi-party conversations were rated more favorably across all measured dimensions compared to conversation excerpts sampled from a human-collected multi-party dataset.

[28]  arXiv:2302.03281 (cross-list from cs.LG) [pdf, other]
Title: Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Modern representation learning methods may fail to adapt quickly under non-stationarity since they suffer from the problem of catastrophic forgetting and decaying plasticity. Such problems prevent learners from fast adaptation to changes since they result in increasing numbers of saturated features and forgetting useful features when presented with new experiences. Hence, these methods are rendered ineffective for continual learning. This paper proposes Utility-based Perturbed Gradient Descent (UPGD), an online representation-learning algorithm well-suited for continual learning agents with no knowledge about task boundaries. UPGD protects useful weights or features from forgetting and perturbs less useful ones based on their utilities. Our empirical results show that UPGD alleviates catastrophic forgetting and decaying plasticity, enabling modern representation learning methods to work in the continual learning setting.

[29]  arXiv:2302.03288 (cross-list from cs.RO) [pdf, other]
Title: Object-Centric Scene Representations using Active Inference
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Representing a scene and its constituent objects from raw sensory data is a core ability for enabling robots to interact with their environment. In this paper, we propose a novel approach for scene understanding, leveraging a hierarchical object-centric generative model that enables an agent to infer object category and pose in an allocentric reference frame using active inference, a neuro-inspired framework for action and perception. For evaluating the behavior of an active vision agent, we also propose a new benchmark where, given a target viewpoint of a particular object, the agent needs to find the best matching viewpoint given a workspace with randomly positioned objects in 3D. We demonstrate that our active inference agent is able to balance epistemic foraging and goal-driven behavior, and outperforms both supervised and reinforcement learning baselines by a large margin.

[30]  arXiv:2302.03298 (cross-list from cs.CV) [pdf, other]
Title: Boosting Zero-shot Classification with Synthetic Data Diversity via Stable Diffusion
Comments: (7 pages, 3 figures, 2 tables, preprint)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recent research has shown it is possible to perform zero-shot classification tasks by training a classifier with synthetic data generated by a diffusion model. However, the performance of this approach is still inferior to that of recent vision-language models. It has been suggested that the reason for this is a domain gap between the synthetic and real data. In our work, we show that this domain gap is not the main issue, and that diversity in the synthetic dataset is more important. We propose a \textit{bag of tricks} to improve diversity and are able to achieve performance on par with one of the vision-language models, CLIP. More importantly, this insight allows us to endow zero-shot classification capabilities on any classification model.

[31]  arXiv:2302.03350 (cross-list from cs.SE) [pdf, other]
Title: To Be Forgotten or To Be Fair: Unveiling Fairness Implications of Machine Unlearning Methods
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

The right to be forgotten (RTBF) is motivated by the desire of people not to be perpetually disadvantaged by their past deeds. For this, data deletion needs to be deep and permanent, and should be removed from machine learning models. Researchers have proposed machine unlearning algorithms which aim to erase specific data from trained models more efficiently. However, these methods modify how data is fed into the model and how training is done, which may subsequently compromise AI ethics from the fairness perspective. To help software engineers make responsible decisions when adopting these unlearning methods, we present the first study on machine unlearning methods to reveal their fairness implications. We designed and conducted experiments on two typical machine unlearning methods (SISA and AmnesiacML) along with a retraining method (ORTR) as baseline using three fairness datasets under three different deletion strategies. Experimental results show that under non-uniform data deletion, SISA leads to better fairness compared with ORTR and AmnesiacML, while initial training and uniform data deletion do not necessarily affect the fairness of all three methods. These findings have exposed an important research problem in software engineering, and can help practitioners better understand the potential trade-offs on fairness when considering solutions for RTBF.

[32]  arXiv:2302.03364 (cross-list from cs.LG) [pdf, other]
Title: Population-size-Aware Policy Optimization for Mean-Field Games
Comments: Accepted to International Conference on Learning Representations (ICLR 2023)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

In this work, we attempt to bridge the two fields of finite-agent and infinite-agent games, by studying how the optimal policies of agents evolve with the number of agents (population size) in mean-field games, an agent-centric perspective in contrast to the existing works focusing typically on the convergence of the empirical distribution of the population. To this end, the premise is to obtain the optimal policies of a set of finite-agent games with different population sizes. However, either deriving the closed-form solution for each game is theoretically intractable, training a distinct policy for each game is computationally intensive, or directly applying the policy trained in a game to other games is sub-optimal. We address these challenges through the Population-size-Aware Policy Optimization (PAPO). Our contributions are three-fold. First, to efficiently generate efficient policies for games with different population sizes, we propose PAPO, which unifies two natural options (augmentation and hypernetwork) and achieves significantly better performance. PAPO consists of three components: i) the population-size encoding which transforms the original value of population size to an equivalent encoding to avoid training collapse, ii) a hypernetwork to generate a distinct policy for each game conditioned on the population size, and iii) the population size as an additional input to the generated policy. Next, we construct a multi-task-based training procedure to efficiently train the neural networks of PAPO by sampling data from multiple games with different population sizes. Finally, extensive experiments on multiple environments show the significant superiority of PAPO over baselines, and the analysis of the evolution of the generated policies further deepens our understanding of the two fields of finite-agent and infinite-agent games.

[33]  arXiv:2302.03391 (cross-list from stat.ML) [pdf, other]
Title: Sparse GEMINI for Joint Discriminative Clustering and Feature Selection
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)

Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on $p(\pmb{x})$, we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple $\ell_1$ penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a clustering model $p_\theta(y|\pmb{x})$. We demonstrate the performances of Sparse GEMINI on synthetic datasets as well as large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.

[34]  arXiv:2302.03438 (cross-list from cs.LG) [pdf, ps, other]
Title: Uncoupled Learning of Differential Stackelberg Equilibria with Commitments
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

A natural solution concept for many multiagent settings is the Stackelberg equilibrium, under which a ``leader'' agent selects a strategy that maximizes its own payoff assuming the ``follower'' chooses their best response to this strategy. Recent work has presented asymmetric learning updates that can be shown to converge to the \textit{differential} Stackelberg equilibria of two-player differentiable games. These updates are ``coupled'' in the sense that the leader requires some information about the follower's payoff function. Such coupled learning rules cannot be applied to \textit{ad hoc} interactive learning settings, and can be computationally impractical even in centralized training settings where the follower's payoffs are known. In this work, we present an ``uncoupled'' learning process under which each player's learning update only depends on their observations of the other's behavior. We prove that this process converges to a local Stackelberg equilibrium under similar conditions as previous coupled methods. We conclude with a discussion of the potential applications of our approach to human--AI cooperation and multi-agent reinforcement learning.

[35]  arXiv:2302.03460 (cross-list from cs.CY) [pdf, ps, other]
Title: Mind the Gap! Bridging Explainable Artificial Intelligence and Human Understanding with Luhmann's Functional Theory of Communication
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Over the past decade explainable artificial intelligence has evolved from a predominantly technical discipline into a field that is deeply intertwined with social sciences. Insights such as human preference for contrastive -- more precisely, counterfactual -- explanations have played a major role in this transition, inspiring and guiding the research in computer science. Other observations, while equally important, have received much less attention. The desire of human explainees to communicate with artificial intelligence explainers through a dialogue-like interaction has been mostly neglected by the community. This poses many challenges for the effectiveness and widespread adoption of such technologies as delivering a single explanation optimised according to some predefined objectives may fail to engender understanding in its recipients and satisfy their unique needs given the diversity of human knowledge and intention. Using insights elaborated by Niklas Luhmann and, more recently, Elena Esposito we apply social systems theory to highlight challenges in explainable artificial intelligence and offer a path forward, striving to reinvigorate the technical research in this direction. This paper aims to demonstrate the potential of systems theoretical approaches to communication in understanding problems and limitations of explainable artificial intelligence.

[36]  arXiv:2302.03472 (cross-list from cs.IR) [pdf, other]
Title: On the Theories Behind Hard Negative Sampling for Recommendation
Comments: Accepted by WWW'2023
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Negative sampling has been heavily used to train recommender models on large-scale data, wherein sampling hard examples usually not only accelerates the convergence but also improves the model accuracy. Nevertheless, the reasons for the effectiveness of Hard Negative Sampling (HNS) have not been revealed yet. In this work, we fill the research gap by conducting thorough theoretical analyses on HNS. Firstly, we prove that employing HNS on the Bayesian Personalized Ranking (BPR) learner is equivalent to optimizing One-way Partial AUC (OPAUC). Concretely, the BPR equipped with Dynamic Negative Sampling (DNS) is an exact estimator, while with softmax-based sampling is a soft estimator. Secondly, we prove that OPAUC has a stronger connection with Top-K evaluation metrics than AUC and verify it with simulation experiments. These analyses establish the theoretical foundation of HNS in optimizing Top-K recommendation performance for the first time. On these bases, we offer two insightful guidelines for effective usage of HNS: 1) the sampling hardness should be controllable, e.g., via pre-defined hyper-parameters, to adapt to different Top-K metrics and datasets; 2) the smaller the $K$ we emphasize in Top-K evaluation metrics, the harder the negative samples we should draw. Extensive experiments on three real-world benchmarks verify the two guidelines.

[37]  arXiv:2302.03487 (cross-list from cs.IR) [pdf, other]
Title: PIER: Permutation-Level Interest-Based End-to-End Re-ranking Framework in E-commerce
Comments: 9 pages, 3 figures
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Re-ranking draws increased attention on both academics and industries, which rearranges the ranking list by modeling the mutual influence among items to better meet users' demands. Many existing re-ranking methods directly take the initial ranking list as input, and generate the optimal permutation through a well-designed context-wise model, which brings the evaluation-before-reranking problem. Meanwhile, evaluating all candidate permutations brings unacceptable computational costs in practice. Thus, to better balance efficiency and effectiveness, online systems usually use a two-stage architecture which uses some heuristic methods such as beam-search to generate a suitable amount of candidate permutations firstly, which are then fed into the evaluation model to get the optimal permutation. However, existing methods in both stages can be improved through the following aspects. As for generation stage, heuristic methods only use point-wise prediction scores and lack an effective judgment. As for evaluation stage, most existing context-wise evaluation models only consider the item context and lack more fine-grained feature context modeling. This paper presents a novel end-to-end re-ranking framework named PIER to tackle the above challenges which still follows the two-stage architecture and contains two mainly modules named FPSM and OCPM. We apply SimHash in FPSM to select top-K candidates from the full permutation based on user's permutation-level interest in an efficient way. Then we design a novel omnidirectional attention mechanism in OCPM to capture the context information in the permutation. Finally, we jointly train these two modules end-to-end by introducing a comparative learning loss. Offline experiment results demonstrate that PIER outperforms baseline models on both public and industrial datasets, and we have successfully deployed PIER on Meituan food delivery platform.

[38]  arXiv:2302.03488 (cross-list from cs.CL) [pdf, other]
Title: APAM: Adaptive Pre-training and Adaptive Meta Learning in Language Model for Noisy Labels and Long-tailed Learning
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Practical natural language processing (NLP) tasks are commonly long-tailed with noisy labels. Those problems challenge the generalization and robustness of complex models such as Deep Neural Networks (DNNs). Some commonly used resampling techniques, such as oversampling or undersampling, could easily lead to overfitting. It is growing popular to learn the data weights leveraging a small amount of metadata. Besides, recent studies have shown the advantages of self-supervised pre-training, particularly to the under-represented data. In this work, we propose a general framework to handle the problem of both long-tail and noisy labels. The model is adapted to the domain of problems in a contrastive learning manner. The re-weighting module is a feed-forward network that learns explicit weighting functions and adapts weights according to metadata. The framework further adapts weights of terms in the loss function through a combination of the polynomial expansion of cross-entropy loss and focal loss. Our extensive experiments show that the proposed framework consistently outperforms baseline methods. Lastly, our sensitive analysis emphasizes the capability of the proposed framework to handle the long-tailed problem and mitigate the negative impact of noisy labels.

[39]  arXiv:2302.03490 (cross-list from cs.CL) [pdf, other]
Title: Natural Language Processing for Policymaking
Comments: Handbook of Computational Social Science for Policy (2023), Chapter 7 (pages 141-162). Open Access on Springer: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Language is the medium for many political activities, from campaigns to news reports. Natural language processing (NLP) uses computational tools to parse text into key information that is needed for policymaking. In this chapter, we introduce common methods of NLP, including text classification, topic modeling, event extraction, and text scaling. We then overview how these methods can be used for policymaking through four major applications including data collection for evidence-based policymaking, interpretation of political decisions, policy communication, and investigation of policy effects. Finally, we highlight some potential limitations and ethical concerns when using NLP for policymaking.
This text is from Chapter 7 (pages 141-162) of the Handbook of Computational Social Science for Policy (2023). Open Access on Springer: https://doi.org/10.1007/978-3-031-16624-2

[40]  arXiv:2302.03494 (cross-list from cs.CL) [pdf, other]
Title: A Categorical Archive of ChatGPT Failures
Authors: Ali Borji
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models have been demonstrated to be valuable in different fields. ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation by comprehending context and generating appropriate responses. It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries, with fluent and comprehensive answers surpassing prior public chatbots in both security and usefulness. However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study. Ten categories of failures, including reasoning, factual errors, math, coding, and bias, are presented and discussed. The risks, limitations, and societal implications of ChatGPT are also highlighted. The goal of this study is to assist researchers and developers in enhancing future language models and chatbots.

[41]  arXiv:2302.03495 (cross-list from cs.IR) [pdf, other]
Title: Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Systematic reviews are comprehensive reviews of the literature for a highly focused research question. These reviews are often treated as the highest form of evidence in evidence-based medicine, and are the key strategy to answer research questions in the medical field. To create a high-quality systematic review, complex Boolean queries are often constructed to retrieve studies for the review topic. However, it often takes a long time for systematic review researchers to construct a high quality systematic review Boolean query, and often the resulting queries are far from effective. Poor queries may lead to biased or invalid reviews, because they missed to retrieve key evidence, or to extensive increase in review costs, because they retrieved too many irrelevant studies. Recent advances in Transformer-based generative models have shown great potential to effectively follow instructions from users and generate answers based on the instructions being made. In this paper, we investigate the effectiveness of the latest of such models, ChatGPT, in generating effective Boolean queries for systematic review literature search. Through a number of extensive experiments on standard test collections for the task, we find that ChatGPT is capable of generating queries that lead to high search precision, although trading-off this for recall. Overall, our study demonstrates the potential of ChatGPT in generating effective Boolean queries for systematic review literature search. The ability of ChatGPT to follow complex instructions and generate queries with high precision makes it a valuable tool for researchers conducting systematic reviews, particularly for rapid reviews where time is a constraint and often trading-off higher precision for lower recall is acceptable.

[42]  arXiv:2302.03499 (cross-list from cs.CL) [pdf, other]
Title: Exploring Data Augmentation for Code Generation Tasks
Comments: Findings of EACL 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

Advances in natural language processing, such as transfer learning from pre-trained language models, have impacted how models are trained for programming language tasks too. Previous research primarily explored code pre-training and expanded it through multi-modality and multi-tasking, yet the data for downstream tasks remain modest in size. Focusing on data utilization for downstream tasks, we propose and adapt augmentation methods that yield consistent improvements in code translation and summarization by up to 6.9% and 7.5% respectively. Further analysis suggests that our methods work orthogonally and show benefits in output code style and numeric consistency. We also discuss test data imperfections.

[43]  arXiv:2302.03507 (cross-list from cs.CL) [pdf, other]
Title: Meta-Learning Siamese Network for Few-Shot Text Classification
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Few-shot learning has been used to tackle the problem of label scarcity in text classification, of which meta-learning based methods have shown to be effective, such as the prototypical networks (PROTO). Despite the success of PROTO, there still exist three main problems: (1) ignore the randomness of the sampled support sets when computing prototype vectors; (2) disregard the importance of labeled samples; (3) construct meta-tasks in a purely random manner. In this paper, we propose a Meta-Learning Siamese Network, namely, Meta-SN, to address these issues. Specifically, instead of computing prototype vectors from the sampled support sets, Meta-SN utilizes external knowledge (e.g. class names and descriptive texts) for class labels, which is encoded as the low-dimensional embeddings of prototype vectors. In addition, Meta-SN presents a novel sampling strategy for constructing meta-tasks, which gives higher sampling probabilities to hard-to-classify samples. Extensive experiments are conducted on six benchmark datasets to show the clear superiority of Meta-SN over other state-of-the-art models. For reproducibility, all the datasets and codes are provided at https://github.com/hccngu/Meta-SN.

[44]  arXiv:2302.03508 (cross-list from cs.CL) [pdf, other]
Title: Cluster-Level Contrastive Learning for Emotion Recognition in Conversations
Comments: Accepted by IEEE Transactions on Affective Computing
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

A key challenge for Emotion Recognition in Conversations (ERC) is to distinguish semantically similar emotions. Some works utilise Supervised Contrastive Learning (SCL) which uses categorical emotion labels as supervision signals and contrasts in high-dimensional semantic space. However, categorical labels fail to provide quantitative information between emotions. ERC is also not equally dependent on all embedded features in the semantic space, which makes the high-dimensional SCL inefficient. To address these issues, we propose a novel low-dimensional Supervised Cluster-level Contrastive Learning (SCCL) method, which first reduces the high-dimensional SCL space to a three-dimensional affect representation space Valence-Arousal-Dominance (VAD), then performs cluster-level contrastive learning to incorporate measurable emotion prototypes. To help modelling the dialogue and enriching the context, we leverage the pre-trained knowledge adapters to infuse linguistic and factual knowledge. Experiments show that our method achieves new state-of-the-art results with 69.81% on IEMOCAP, 65.7% on MELD, and 62.51% on DailyDialog datasets. The analysis also proves that the VAD space is not only suitable for ERC but also interpretable, with VAD prototypes enhancing its performance and stabilising the training of SCCL. In addition, the pre-trained knowledge adapters benefit the performance of the utterance encoder and SCCL. Our code is available at: https://github.com/SteveKGYang/SCCL

[45]  arXiv:2302.03517 (cross-list from cs.DC) [pdf, other]
Title: Optimization of Topology-Aware Job Allocation on a High-Performance Computing Cluster by Neural Simulated Annealing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

Jobs on high-performance computing (HPC) clusters can suffer significant performance degradation due to inter-job network interference. Topology-aware job allocation problem (TJAP) is such a problem that decides how to dedicate nodes to specific applications to mitigate inter-job network interference. In this paper, we study the window-based TJAP on a fat-tree network aiming at minimizing the cost of communication hop, a defined inter-job interference metric. The window-based approach for scheduling repeats periodically taking the jobs in the queue and solving an assignment problem that maps jobs to the available nodes. Two special allocation strategies are considered, i.e., static continuity assignment strategy (SCAS) and dynamic continuity assignment strategy (DCAS). For the SCAS, a 0-1 integer programming is developed. For the DCAS, an approach called neural simulated algorithm (NSA), which is an extension to simulated algorithm (SA) that learns a repair operator and employs them in a guided heuristic search, is proposed. The efficacy of NSA is demonstrated with a computational study against SA and SCIP. The results of numerical experiments indicate that both the model and algorithm proposed in this paper are effective.

[46]  arXiv:2302.03519 (cross-list from cs.LG) [pdf, other]
Title: Efficient Parametric Approximations of Neural Network Function Space Distance
Comments: 21 pages, 5 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.

[47]  arXiv:2302.03554 (cross-list from cs.CY) [pdf, other]
Title: Simulating the impact of cognitive biases on the mobility transition
Authors: Carole Adam
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Climate change is becoming more visible, and human adaptation is required urgently to prevent greater damage. One particular domain of adaptation concerns daily mobility (work commute), with a significant portion of these trips being done in individual cars. Yet, their impact on pollution, noise, or accidents is well-known. This paper explores various cognitive biases that can explain such lack of adaptation. Our approach is to design simple interactive simulators that users can play with in order to understand biases. The idea is that awareness of such cognitive biases is often a first step towards more rational decision making, even though things are not that simple. This paper reports on three simulators, each focused on a particular factor of resistance. Various scenarios are simulated to demonstrate their explanatory power. These simulators are already available to play online, with the goal to provide users with food for thought about how mobility could evolve in the future. Work is still ongoing to design a user survey to evaluate their impact.

[48]  arXiv:2302.03561 (cross-list from cs.LG) [pdf, other]
Title: Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Systems and Control (eess.SY); Machine Learning (stat.ML)

We study the problem of optimizing a recommender system for outcomes that occur over several weeks or months. We begin by drawing on reinforcement learning to formulate a comprehensive model of users' recurring relationships with a recommender system. Measurement, attribution, and coordination challenges complicate algorithm design. We describe careful modeling -- including a new representation of user state and key conditional independence assumptions -- which overcomes these challenges and leads to simple, testable recommender system prototypes. We apply our approach to a podcast recommender system that makes personalized recommendations to hundreds of millions of listeners. A/B tests demonstrate that purposefully optimizing for long-term outcomes leads to large performance gains over conventional approaches that optimize for short-term proxies.

[49]  arXiv:2302.03573 (cross-list from cs.RO) [pdf, other]
Title: Local Neural Descriptor Fields: Locally Conditioned Object Representations for Manipulation
Comments: ICRA 2023, Project Page: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

A robot operating in a household environment will see a wide range of unique and unfamiliar objects. While a system could train on many of these, it is infeasible to predict all the objects a robot will see. In this paper, we present a method to generalize object manipulation skills acquired from a limited number of demonstrations, to novel objects from unseen shape categories. Our approach, Local Neural Descriptor Fields (L-NDF), utilizes neural descriptors defined on the local geometry of the object to effectively transfer manipulation demonstrations to novel objects at test time. In doing so, we leverage the local geometry shared between objects to produce a more general manipulation framework. We illustrate the efficacy of our approach in manipulating novel objects in novel poses -- both in simulation and in the real world.

[50]  arXiv:2302.03586 (cross-list from cs.LG) [pdf, other]
Title: Adaptive Aggregation for Safety-Critical Control
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Safety has been recognized as the central obstacle to preventing the use of reinforcement learning (RL) for real-world applications. Different methods have been developed to deal with safety concerns in RL. However, learning reliable RL-based solutions usually require a large number of interactions with the environment. Likewise, how to improve the learning efficiency, specifically, how to utilize transfer learning for safe reinforcement learning, has not been well studied. In this work, we propose an adaptive aggregation framework for safety-critical control. Our method comprises two key techniques: 1) we learn to transfer the safety knowledge by aggregating the multiple source tasks and a target task through the attention network; 2) we separate the goal of improving task performance and reducing constraint violations by utilizing a safeguard. Experiment results demonstrate that our algorithm can achieve fewer safety violations while showing better data efficiency compared with several baselines.

[51]  arXiv:2302.03601 (cross-list from eess.SP) [pdf, other]
Title: Industrial computed tomography based intelligent non-destructive testing method for power capacitor
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Power capacitor device is a widely used reactive power compensation equipment in power transmission and distribution system which can easily have internal fault and therefore affects the safe operation of the power system. An intelligent non-destructive testing (I-NDT) method based on ICT is proposed to test the quality of power capacitors automatically in this study. The internal structure of power capacitors would be scanned by the ICT device and then defects could be recognized by the SSD algorithm. Moreover, the data data augmentation algorithm is used to extend the image set to improve the stability and accuracy of the trained SSD model.

[52]  arXiv:2302.03602 (cross-list from cond-mat.dis-nn) [pdf, other]
Title: Reply to: Modern graph neural networks do worse than classical greedy algorithms in solving combinatorial optimization problems like maximum independent set
Comments: Manuscript: 3 pages, 2 figures
Journal-ref: Nature Machine Intelligence 5, 32 (2023)
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC); Quantum Physics (quant-ph)

We provide a comprehensive reply to the comment written by Chiara Angelini and Federico Ricci-Tersenghi [arXiv:2206.13211] and argue that the comment singles out one particular non-representative example problem, entirely focusing on the maximum independent set (MIS) on sparse graphs, for which greedy algorithms are expected to perform well. Conversely, we highlight the broader algorithmic development underlying our original work, and (within our original framework) provide additional numerical results showing sizable improvements over our original results, thereby refuting the comment's performance statements. We also provide results showing run-time scaling superior to the results provided by Angelini and Ricci-Tersenghi. Furthermore, we show that the proposed set of random d-regular graphs does not provide a universal set of benchmark instances, nor do greedy heuristics provide a universal algorithmic baseline. Finally, we argue that the internal (parallel) anatomy of graph neural networks is very different from the (sequential) nature of greedy algorithms and emphasize that graph neural networks have demonstrated their potential for superior scalability compared to existing heuristics such as parallel tempering. We conclude by discussing the conceptual novelty of our work and outline some potential extensions.

[53]  arXiv:2302.03616 (cross-list from cs.LG) [pdf, other]
Title: Can gamification reduce the burden of self-reporting in mHealth applications? Feasibility study using machine learning from smartwatch data to estimate cognitive load
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

The effectiveness of digital treatments can be measured by requiring patients to self-report their mental and physical state through mobile applications. However, self-reporting can be overwhelming and may cause patients to disengage from the intervention. In order to address this issue, we conduct a feasibility study to explore the impact of gamification on the cognitive burden of self-reporting. Our approach involves the creation of a system to assess cognitive burden through the analysis of photoplethysmography (PPG) signals obtained from a smartwatch. The system is built by collecting PPG data during both cognitively demanding tasks and periods of rest. The obtained data is utilized to train a machine learning model to detect cognitive load (CL). Subsequently, we create two versions of health surveys: a gamified version and a traditional version. Our aim is to estimate the cognitive load experienced by participants while completing these surveys using their mobile devices. We find that CL detector performance can be enhanced via pre-training on stress detection tasks and requires capturing of a minimum 30 seconds of PPG signal to work adequately. For 10 out of 13 participants, a personalized cognitive load detector can achieve an F1 score above 0.7. We find no difference between the gamified and non-gamified mobile surveys in terms of time spent in the state of high cognitive load but participants prefer the gamified version. The average time spent on each question is 5.5 for gamified survey vs 6 seconds for the non-gamified version.

[54]  arXiv:2302.03629 (cross-list from cs.CV) [pdf, ps, other]
Title: Ethical Considerations for Collecting Human-Centric Image Datasets
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG)

Human-centric image datasets are critical to the development of computer vision technologies. However, recent investigations have foregrounded significant ethical issues related to privacy and bias, which have resulted in the complete retraction, or modification, of several prominent datasets. Recent works have tried to reverse this trend, for example, by proposing analytical frameworks for ethically evaluating datasets, the standardization of dataset documentation and curation practices, privacy preservation methodologies, as well as tools for surfacing and mitigating representational biases. Little attention, however, has been paid to the realities of operationalizing ethical data collection. To fill this gap, we present a set of key ethical considerations and practical recommendations for collecting more ethically-minded human-centric image data. Our research directly addresses issues of privacy and bias by contributing to the research community best practices for ethical data collection, covering purpose, privacy and consent, as well as diversity. We motivate each consideration by drawing on lessons from current practices, dataset withdrawals and audits, and analytical ethical frameworks. Our research is intended to augment recent scholarship, representing an important step toward more responsible data curation practices.

[55]  arXiv:2302.03660 (cross-list from cs.LG) [pdf, other]
Title: Riemannian Flow Matching on General Geometries
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose Riemannian Flow Matching (RFM), a simple yet powerful framework for training continuous normalizing flows on manifolds. Existing methods for generative modeling on manifolds either require expensive simulation, inherently cannot scale to high dimensions, or use approximations to limiting quantities that result in biased objectives. Riemannian Flow Matching bypasses these inconveniences and exhibits multiple benefits over prior approaches: It is completely simulation-free on simple geometries, it does not require divergence computation, and its target vector field is computed in closed form even on general geometries. The key ingredient behind RFM is the construction of a simple kernel function for defining per-sample vector fields, which subsumes existing Euclidean cases. Extending to general geometries, we rely on the use of spectral decompositions to efficiently compute kernel functions. Our method achieves state-of-the-art performance on real-world non-Euclidean datasets, and we showcase, for the first time, tractable training on general geometries, including on triangular meshes and maze-like manifolds with boundaries.

[56]  arXiv:2302.03665 (cross-list from cs.CV) [pdf, other]
Title: HumanMAC: Masked Motion Completion for Human Motion Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Human motion prediction is a classical problem in computer vision and computer graphics, which has a wide range of practical applications. Previous effects achieve great empirical performance based on an encoding-decoding fashion. The methods of this fashion work by first encoding previous motions to latent representations and then decoding the latent representations into predicted motions. However, in practice, they are still unsatisfactory due to several issues, including complicated loss constraints, cumbersome training processes, and scarce switch of different categories of motions in prediction. In this paper, to address the above issues, we jump out of the foregoing fashion and propose a novel framework from a new perspective. Specifically, our framework works in a denoising diffusion style. In the training stage, we learn a motion diffusion model that generates motions from random noise. In the inference stage, with a denoising procedure, we make motion prediction conditioning on observed motions to output more continuous and controllable predictions. The proposed framework enjoys promising algorithmic properties, which only needs one loss in optimization and is trained in an end-to-end manner. Additionally, it accomplishes the switch of different categories of motions effectively, which is significant in realistic tasks, \textit{e.g.}, the animation task. Comprehensive experiments on benchmarks confirm the superiority of the proposed framework. The project page is available at \url{https://lhchen.top/Human-MAC}.

[57]  arXiv:2302.03669 (cross-list from cs.LG) [pdf, other]
Title: Deep Reinforcement Learning for Traffic Light Control in Intelligent Transportation Systems
Comments: 17 pages
Journal-ref: IEEE Transactions on Network Science and Engineering, 2023
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Smart traffic lights in intelligent transportation systems (ITSs) are envisioned to greatly increase traffic efficiency and reduce congestion. Deep reinforcement learning (DRL) is a promising approach to adaptively control traffic lights based on the real-time traffic situation in a road network. However, conventional methods may suffer from poor scalability. In this paper, we investigate deep reinforcement learning to control traffic lights, and both theoretical analysis and numerical experiments show that the intelligent behavior ``greenwave" (i.e., a vehicle will see a progressive cascade of green lights, and not have to brake at any intersection) emerges naturally a grid road network, which is proved to be the optimal policy in an avenue with multiple cross streets. As a first step, we use two DRL algorithms for the traffic light control problems in two scenarios. In a single road intersection, we verify that the deep Q-network (DQN) algorithm delivers a thresholding policy; and in a grid road network, we adopt the deep deterministic policy gradient (DDPG) algorithm. Secondly, numerical experiments show that the DQN algorithm delivers the optimal control, and the DDPG algorithm with passive observations has the capability to produce on its own a high-level intelligent behavior in a grid road network, namely, the ``greenwave" policy emerges. We also verify the ``greenwave" patterns in a $5 \times 10$ grid road network. Thirdly, the ``greenwave" patterns demonstrate that DRL algorithms produce favorable solutions since the ``greenwave" policy shown in experiment results is proved to be optimal in a specified traffic model (an avenue with multiple cross streets). The delivered policies both in a single road intersection and a grid road network demonstrate the scalability of DRL algorithms.

[58]  arXiv:2302.03673 (cross-list from cs.LG) [pdf, ps, other]
Title: Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation
Comments: 51 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

We propose a new model, independent linear Markov game, for multi-agent reinforcement learning with a large state space and a large number of agents. This is a class of Markov games with independent linear function approximation, where each agent has its own function approximation for the state-action value functions that are marginalized by other players' policies. We design new algorithms for learning the Markov coarse correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample complexity bounds that only scale polynomially with each agent's own function class complexity, thus breaking the curse of multiagents. In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents. Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; (2) separating learning Markov equilibria and exploration in the Markov games, which allows us to use the full-information no-regret learning oracle instead of the stronger bandit-feedback no-regret learning oracle used in the tabular setting. Furthermore, we propose an iterative-best-response type algorithm that can learn pure Markov Nash equilibria in independent linear Markov potential games. In the tabular case, by adapting the policy replay mechanism for independent linear Markov games, we propose an algorithm with $\widetilde{O}(\epsilon^{-2})$ sample complexity to learn Markov CCE, which improves the state-of-the-art result $\widetilde{O}(\epsilon^{-3})$ in Daskalakis et al. 2022, where $\epsilon$ is the desired accuracy, and also significantly improves other problem parameters.

[59]  arXiv:2302.03675 (cross-list from cs.CV) [pdf, other]
Title: Auditing Gender Presentation Differences in Text-to-Image Models
Comments: Preprint, 23 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)

Text-to-image models, which can generate high-quality images based on textual input, have recently enabled various content-creation tools. Despite significantly affecting a wide range of downstream applications, the distributions of these generated images are still not fully understood, especially when it comes to the potential stereotypical attributes of different genders. In this work, we propose a paradigm (Gender Presentation Differences) that utilizes fine-grained self-presentation attributes to study how gender is presented differently in text-to-image models. By probing gender indicators in the input text (e.g., "a woman" or "a man"), we quantify the frequency differences of presentation-centric attributes (e.g., "a shirt" and "a dress") through human annotation and introduce a novel metric: GEP. Furthermore, we propose an automatic method to estimate such differences. The automatic GEP metric based on our approach yields a higher correlation with human annotations than that based on existing CLIP scores, consistently across three state-of-the-art text-to-image models. Finally, we demonstrate the generalization ability of our metrics in the context of gender stereotypes related to occupations.

[60]  arXiv:2302.03684 (cross-list from cs.LG) [pdf, other]
Title: Temporal Robustness against Data Poisoning
Comments: 11 pages, 7 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Data poisoning considers cases when an adversary maliciously inserts and removes training data to manipulate the behavior of machine learning algorithms. Traditional threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, existing defenses are essentially vulnerable in practice when poisoning more samples remains a feasible option for attackers. To address this issue, we leverage timestamps denoting the birth dates of data, which are often available but neglected in the past. Benefiting from these timestamps, we propose a temporal threat model of data poisoning and derive two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted. With these metrics, we define the notions of temporal robustness against data poisoning, providing a meaningful sense of protection even with unbounded amounts of poisoned samples. We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal modeling of data poisoning.

[61]  arXiv:2302.03686 (cross-list from cs.LG) [pdf, other]
Title: Long Horizon Temperature Scaling
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Temperature scaling is a popular technique for tuning the sharpness of a model distribution. It is used extensively for sampling likely generations and calibrating model uncertainty, and even features as a controllable parameter to many large language models in deployment. However, autoregressive models rely on myopic temperature scaling that greedily optimizes the next token. To address this, we propose Long Horizon Temperature Scaling (LHTS), a novel approach for sampling from temperature-scaled joint distributions. LHTS is compatible with all likelihood-based models, and optimizes for the long-horizon likelihood of samples. We derive a temperature-dependent LHTS objective, and show that fine-tuning a model on a range of temperatures produces a single model capable of generation with a controllable long-horizon temperature parameter. We experiment with LHTS on image diffusion models and character/language autoregressive models, demonstrating advantages over myopic temperature scaling in likelihood and sample quality, and showing improvements in accuracy on a multiple choice analogy task by $10\%$.

Replacements for Wed, 8 Feb 23

[62]  arXiv:2006.06412 (replaced) [pdf, other]
Title: Modeling Human Driving Behavior through Generative Adversarial Imitation Learning
Comments: 14 pages, 8 figures. To be published in the IEEE Transactions on Intelligent Transportation Systems
Subjects: Artificial Intelligence (cs.AI)
[63]  arXiv:2103.02728 (replaced) [pdf]
Title: Morality, Machines and the Interpretation Problem: A Value-based, Wittgensteinian Approach to Building Moral Agents
Journal-ref: In: Bramer, M., Stahl, F. (eds) Artificial Intelligence XXXIX. SGAI-AI 2022. Lecture Notes in Computer Science(), vol 13652. Springer, Cham (2022)
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
[64]  arXiv:2109.03283 (replaced) [pdf]
Title: Have a break from making decisions, have a MARS: The Multi-valued Action Reasoning System
Authors: Cosmin Badea
Journal-ref: In: Bramer, M., Stahl, F. (eds) Artificial Intelligence XXXIX. SGAI-AI 2022. Lecture Notes in Computer Science(), vol 13652. Springer, Cham
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Logic in Computer Science (cs.LO)
[65]  arXiv:2301.05345 (replaced) [pdf, other]
Title: GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer
Comments: This manuscript was accepted to AAAI 2023 Main Track
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[66]  arXiv:2301.12158 (replaced) [pdf, other]
Title: A System for Human-AI collaboration for Online Customer Support
Subjects: Artificial Intelligence (cs.AI)
[67]  arXiv:2302.02093 (replaced) [pdf]
Title: Knowledge-enhanced Neural Machine Reasoning: A Review
Comments: 8 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[68]  arXiv:2302.02187 (replaced) [pdf, other]
Title: Appropriate Reliance on AI Advice: Conceptualization and the Effect of Explanations
Comments: Accepted at ACM IUI 2023. arXiv admin note: text overlap with arXiv:2204.06916
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[69]  arXiv:2104.12868 (replaced) [pdf]
Title: Fuzzy Expert Systems for Prediction of ICU Admission in Patients with COVID-19
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[70]  arXiv:2111.01632 (replaced) [pdf, other]
Title: Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[71]  arXiv:2112.04698 (replaced) [pdf, other]
Title: Applications of Explainable AI for 6G: Technical Aspects, Use Cases, and Research Challenges
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
[72]  arXiv:2112.12591 (replaced) [pdf, other]
Title: Black-Box Testing of Deep Neural Networks Through Test Case Diversity
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[73]  arXiv:2202.12263 (replaced) [pdf, other]
Title: Causal Effect Identification in Cluster DAGs
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[74]  arXiv:2203.03047 (replaced) [pdf, other]
Title: Recent Advances in Neural Text Generation: A Task-Agnostic Survey
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[75]  arXiv:2203.09441 (replaced) [pdf, other]
Title: Learning of Structurally Unambiguous Probabilistic Grammars
Comments: arXiv admin note: substantial text overlap with arXiv:2011.07472
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
[76]  arXiv:2205.00415 (replaced) [pdf, other]
Title: Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions
Comments: Accepted to EACL 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[77]  arXiv:2205.13450 (replaced) [pdf, ps, other]
Title: Variance-Aware Sparse Linear Bandits
Comments: Accepted to ICLR 2023
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[78]  arXiv:2206.00979 (replaced) [pdf, other]
Title: Graph Kernels Based on Multi-scale Graph Embeddings
Comments: 15 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[79]  arXiv:2206.06243 (replaced) [pdf, other]
Title: Contrastive Learning for Unsupervised Domain Adaptation of Time Series
Comments: Published as a conference paper at ICLR 2023
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[80]  arXiv:2206.10690 (replaced) [pdf, other]
Title: Learning Continuous Rotation Canonicalization with Radial Beam Sampling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[81]  arXiv:2207.10018 (replaced) [pdf, other]
Title: Mitigating Algorithmic Bias with Limited Annotations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[82]  arXiv:2208.14057 (replaced) [pdf, other]
Title: Symmetric Pruning in Quantum Neural Networks
Comments: Accepted to International Conference on Learning Representations (ICLR) 2023
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[83]  arXiv:2209.01207 (replaced) [pdf, other]
Title: Co-Imitation: Learning Design and Behaviour by Imitation
Comments: 14 pages, 11 figures, accepted for AAAI-23
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[84]  arXiv:2209.09315 (replaced) [pdf, other]
Title: Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
[85]  arXiv:2210.00660 (replaced) [pdf, other]
Title: A Non-monotonic Self-terminating Language Model
Comments: Published as a conference paper at ICLR 2023
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[86]  arXiv:2210.03150 (replaced) [pdf, other]
Title: Towards Out-of-Distribution Adversarial Robustness
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[87]  arXiv:2210.05406 (replaced) [pdf, other]
Title: Code Librarian: A Software Package Recommendation System
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[88]  arXiv:2210.07054 (replaced) [pdf, other]
Title: Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation
Comments: Accepted at EACL 2023 (main conference)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[89]  arXiv:2210.11350 (replaced) [pdf, other]
Title: A Survey on Over-the-Air Computation
Comments: 32 pages, 6 figures; Comments are welcome! (IEEE Communications Surveys & Tutorials)
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[90]  arXiv:2210.15675 (replaced) [pdf, ps, other]
Title: Feature Necessity & Relevancy in ML Classifier Explanations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[91]  arXiv:2211.00075 (replaced) [pdf, ps, other]
Title: Keywords for Bias
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[92]  arXiv:2211.03758 (replaced) [pdf, other]
Title: Privacy Aware Experiments without Cookies
Comments: Technical report supplementing paper accepted to WSDM 23
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[93]  arXiv:2211.16653 (replaced) [pdf]
Title: CRU: A Novel Neural Architecture for Improving the Predictive Performance of Time-Series Data
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[94]  arXiv:2212.03597 (replaced) [pdf, other]
Title: DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Comments: Equal contribution by the first 3 authors. Code has been released as a part of this https URL Part of this paper is from our previous arxiv report (arXiv:2211.11586)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[95]  arXiv:2212.13425 (replaced) [pdf, other]
Title: GEDI: GEnerative and DIscriminative Training for Self-Supervised Learning
Comments: Fixed typos/cleaned the experimental section
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[96]  arXiv:2301.08414 (replaced) [pdf, other]
Title: FG-Depth: Flow-Guided Unsupervised Monocular Depth Estimation
Comments: Accepted by ICRA2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[97]  arXiv:2301.08855 (replaced) [pdf, other]
Title: ProKD: An Unsupervised Prototypical Knowledge Distillation Network for Zero-Resource Cross-Lingual Named Entity Recognition
Comments: AAAI 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[98]  arXiv:2301.10343 (replaced) [pdf, other]
Title: ClimaX: A foundation model for weather and climate
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[99]  arXiv:2302.01404 (replaced) [pdf, other]
Title: Provably Bounding Neural Network Preimages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[100]  arXiv:2302.01735 (replaced) [pdf, other]
Title: Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[101]  arXiv:2302.02337 (replaced) [pdf]
Title: Regulating ChatGPT and other Large Generative AI Models
Comments: under review
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[102]  arXiv:2302.02568 (replaced) [pdf, other]
Title: Less is More: Understanding Word-level Textual Adversarial Attack via n-gram Frequency Descend
Comments: 8 pages, 4 figures. In progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[ total of 102 entries: 1-102 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2302, contact, help  (Access key information)