Artificial Intelligence

New submissions

Submissions received from Mon 29 Apr 24 to Tue 30 Apr 24, announced Wed, 1 May 24

New submissions
Cross-lists
Replacements

[ total of 144 entries: 1-144 ]
[ showing up to 250 entries per page: fewer | more ]

New submissions for Wed, 1 May 24

[1] arXiv:2404.18982 [pdf, ps, other]: Title: Can ChatGPT Make Explanatory Inferences? Benchmarks for Abductive Reasoning

Authors: Paul Thagard

Subjects: Artificial Intelligence (cs.AI)

Explanatory inference is the creation and evaluation of hypotheses that provide explanations, and is sometimes known as abduction or abductive inference. Generative AI is a new set of artificial intelligence models based on novel algorithms for generating text, images, and sounds. This paper proposes a set of benchmarks for assessing the ability of AI programs to perform explanatory inference, and uses them to determine the extent to which ChatGPT, a leading generative AI model, is capable of making explanatory inferences. Tests on the benchmarks reveal that ChatGPT performs creative and evaluative inferences in many domains, although it is limited to verbal and visual modalities. Claims that ChatGPT and similar models are incapable of explanation, understanding, causal reasoning, meaning, and creativity are rebutted.
[2] arXiv:2404.19065 [pdf, other]: Title: HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models

Authors: Gabriel Sarch, Sahil Somani, Raghav Kapoor, Michael J. Tarr, Katerina Fragkiadaki

Comments: Videos and code this https URL

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recent research on instructable agents has used memory-augmented Large Language Models (LLMs) as task planners, a technique that retrieves language-program examples relevant to the input instruction and uses them as in-context examples in the LLM prompt to improve the performance of the LLM in inferring the correct action and task plans. In this technical report, we extend the capabilities of HELPER, by expanding its memory with a wider array of examples and prompts, and by integrating additional APIs for asking questions. This simple expansion of HELPER into a shared memory enables the agent to work across the domains of executing plans from dialogue, natural language instruction following, active question asking, and commonsense room reorganization. We evaluate the agent on four diverse interactive visual-language embodied agent benchmarks: ALFRED, TEACh, DialFRED, and the Tidy Task. HELPER-X achieves few-shot, state-of-the-art performance across these benchmarks using a single agent, without requiring in-domain training, and remains competitive with agents that have undergone in-domain training.
[3] arXiv:2404.19146 [pdf, other]: Title: Automated Construction of Theme-specific Knowledge Graphs

Authors: Linyi Ding, Sizhe Zhou, Jinfeng Xiao, Jiawei Han

Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Despite widespread applications of knowledge graphs (KGs) in various tasks such as question answering and intelligent conversational systems, existing KGs face two major challenges: information granularity and deficiency in timeliness. These hinder considerably the retrieval and analysis of in-context, fine-grained, and up-to-date knowledge from KGs, particularly in highly specialized themes (e.g., specialized scientific research) and rapidly evolving contexts (e.g., breaking news or disaster tracking). To tackle such challenges, we propose a theme-specific knowledge graph (i.e., ThemeKG), a KG constructed from a theme-specific corpus, and design an unsupervised framework for ThemeKG construction (named TKGCon). The framework takes raw theme-specific corpus and generates a high-quality KG that includes salient entities and relations under the theme. Specifically, we start with an entity ontology of the theme from Wikipedia, based on which we then generate candidate relations by Large Language Models (LLMs) to construct a relation ontology. To parse the documents from the theme corpus, we first map the extracted entity pairs to the ontology and retrieve the candidate relations. Finally, we incorporate the context and ontology to consolidate the relations for entity pairs. We observe that directly prompting GPT-4 for theme-specific KG leads to inaccurate entities (such as "two main types" as one entity in the query result) and unclear (such as "is", "has") or wrong relations (such as "have due to", "to start"). In contrast, by constructing the theme-specific KG step by step, our model outperforms GPT-4 and could consistently identify accurate entities and relations. Experimental results also show that our framework excels in evaluations compared with various KG construction baselines.
[4] arXiv:2404.19234 [pdf, other]: Title: Multi-hop Question Answering over Knowledge Graphs using Large Language Models

Authors: Abir Chakraborty

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Databases (cs.DB)

Knowledge graphs (KGs) are large datasets with specific structures representing large knowledge bases (KB) where each node represents a key entity and relations amongst them are typed edges. Natural language queries formed to extract information from a KB entail starting from specific nodes and reasoning over multiple edges of the corresponding KG to arrive at the correct set of answer nodes. Traditional approaches of question answering on KG are based on (a) semantic parsing (SP), where a logical form (e.g., S-expression, SPARQL query, etc.) is generated using node and edge embeddings and then reasoning over these representations or tuning language models to generate the final answer directly, or (b) information-retrieval based that works by extracting entities and relations sequentially. In this work, we evaluate the capability of (LLMs) to answer questions over KG that involve multiple hops. We show that depending upon the size and nature of the KG we need different approaches to extract and feed the relevant information to an LLM since every LLM comes with a fixed context window. We evaluate our approach on six KGs with and without the availability of example-specific sub-graphs and show that both the IR and SP-based methods can be adopted by LLMs resulting in an extremely competitive performance.
[5] arXiv:2404.19256 [pdf, other]: Title: Bias Mitigation via Compensation: A Reinforcement Learning Perspective

Authors: Nandhini Swaminathan, David Danks

Comments: 8 pages, 5 diagrams

Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

As AI increasingly integrates with human decision-making, we must carefully consider interactions between the two. In particular, current approaches focus on optimizing individual agent actions but often overlook the nuances of collective intelligence. Group dynamics might require that one agent (e.g., the AI system) compensate for biases and errors in another agent (e.g., the human), but this compensation should be carefully developed. We provide a theoretical framework for algorithmic compensation that synthesizes game theory and reinforcement learning principles to demonstrate the natural emergence of deceptive outcomes from the continuous learning dynamics of agents. We provide simulation results involving Markov Decision Processes (MDP) learning to interact. This work then underpins our ethical analysis of the conditions in which AI agents should adapt to biases and behaviors of other agents in dynamic and complex decision-making environments. Overall, our approach addresses the nuanced role of strategic deception of humans, challenging previous assumptions about its detrimental effects. We assert that compensation for others' biases can enhance coordination and ethical alignment: strategic deception, when ethically managed, can positively shape human-AI interactions.
[6] arXiv:2404.19303 [pdf, ps, other]: Title: Data Set Terminology of Artificial Intelligence in Medicine: A Historical Review and Recommendation

Authors: Shannon L. Walston, Hiroshi Seki, Hirotaka Takita, Yasuhito Mitsuyama, Shingo Sato, Akifumi Hagiwara, Rintaro Ito, Shouhei Hanaoka, Yukio Miki, Daiju Ueda

Comments: Totally 20 pages, 3 figures, 3 tables

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Medicine and artificial intelligence (AI) engineering represent two distinct fields each with decades of published history. With such history comes a set of terminology that has a specific way in which it is applied. However, when two distinct fields with overlapping terminology start to collaborate, miscommunication and misunderstandings can occur. This narrative review aims to give historical context for these terms, accentuate the importance of clarity when these terms are used in medical AI contexts, and offer solutions to mitigate misunderstandings by readers from either field. Through an examination of historical documents, including articles, writing guidelines, and textbooks, this review traces the divergent evolution of terms for data sets and their impact. Initially, the discordant interpretations of the word 'validation' in medical and AI contexts are explored. Then the data sets used for AI evaluation are classified, namely random splitting, cross-validation, temporal, geographic, internal, and external sets. The accurate and standardized description of these data sets is crucial for demonstrating the robustness and generalizability of AI applications in medicine. This review clarifies existing literature to provide a comprehensive understanding of these classifications and their implications in AI evaluation. This review then identifies often misunderstood terms and proposes pragmatic solutions to mitigate terminological confusion. Among these solutions are the use of standardized terminology such as 'training set,' 'validation (or tuning) set,' and 'test set,' and explicit definition of data set splitting terminologies in each medical AI research publication. This review aspires to enhance the precision of communication in medical AI, thereby fostering more effective and transparent research methodologies in this interdisciplinary field.
[7] arXiv:2404.19336 [pdf, ps, other]: Title: Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts

Authors: Yanggyu Lee, Suchae Jeong, Jihie Kim

Comments: 12 pages, 5 figures

Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

LLMs trained in the understanding of programming syntax are now providing effective assistance to developers and are being used in programming education such as in generation of coding problem examples or providing code explanations. A key aspect of programming education is understanding and dealing with error message. However, 'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler. In this study, building on existing research on programming errors, we first define the types of logical errors that can occur in programming in general. Based on the definition, we propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts. The experimental results indicate that when such logical error descriptions in the prompt are used, the average classifition performance is about 21% higher than the ones without them. We also conducted an experiment for exploiting the relations among errors in generating a new logical error dataset using LLMs. As there is very limited dataset for logical errors such benchmark dataset can be very useful for various programming related applications. We expect that our work can assist novice programmers in identifying the causes of code errors and correct them more effectively.
[8] arXiv:2404.19370 [pdf, other]: Title: Numeric Reward Machines

Authors: Kristina Levina, Nikolaos Pappas, Athanasios Karapantelakis, Aneta Vulgarakis Feljan, Jendrik Seipp

Comments: ICAPS 2024; Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Reward machines inform reinforcement learning agents about the reward structure of the environment and often drastically speed up the learning process. However, reward machines only accept Boolean features such as robot-reached-gold. Consequently, many inherently numeric tasks cannot profit from the guidance offered by reward machines. To address this gap, we aim to extend reward machines with numeric features such as distance-to-gold. For this, we present two types of reward machines: numeric-Boolean and numeric. In a numeric-Boolean reward machine, distance-to-gold is emulated by two Boolean features distance-to-gold-decreased and robot-reached-gold. In a numeric reward machine, distance-to-gold is used directly alongside the Boolean feature robot-reached-gold. We compare our new approaches to a baseline reward machine in the Craft domain, where the numeric feature is the agent-to-target distance. We use cross-product Q-learning, Q-learning with counter-factual experiences, and the options framework for learning. Our experimental results show that our new approaches significantly outperform the baseline approach. Extending reward machines with numeric features opens up new possibilities of using reward machines in inherently numeric tasks.
[9] arXiv:2404.19454 [pdf, other]: Title: Optimized neural forms for solving ordinary differential equations

Authors: Adam D. Kypriadis, Isaac E. Lagaris, Aristidis Likas, Konstantinos E. Parsopoulos

Subjects: Artificial Intelligence (cs.AI)

A critical issue in approximating solutions of ordinary differential equations using neural networks is the exact satisfaction of the boundary or initial conditions. For this purpose, neural forms have been introduced, i.e., functional expressions that depend on neural networks which, by design, satisfy the prescribed conditions exactly. Expanding upon prior progress, the present work contributes in three distinct aspects. First, it presents a novel formalism for crafting optimized neural forms. Second, it outlines a method for establishing an upper bound on the absolute deviation from the exact solution. Third, it introduces a technique for converting problems with Neumann or Robin conditions into equivalent problems with parametric Dirichlet conditions. The proposed optimized neural forms were numerically tested on a set of diverse problems, encompassing first-order and second-order ordinary differential equations, as well as first-order systems. Stiff and delay differential equations were also considered. The obtained solutions were compared against solutions obtained via Runge-Kutta methods and exact solutions wherever available. The reported results and analysis verify that in addition to the exact satisfaction of the boundary or initial conditions, optimized neural forms provide closed-form solutions of superior interpolation capability and controllable overall accuracy.
[10] arXiv:2404.19485 [pdf, other]: Title: IID Relaxation by Logical Expressivity: A Research Agenda for Fitting Logics to Neurosymbolic Requirements

Authors: Maarten C. Stol, Alessandra Mileo

Comments: 12 pages, 2 figures, submitted to NeSy 2024

Subjects: Artificial Intelligence (cs.AI)

Neurosymbolic background knowledge and the expressivity required of its logic can break Machine Learning assumptions about data Independence and Identical Distribution. In this position paper we propose to analyze IID relaxation in a hierarchy of logics that fit different use case requirements. We discuss the benefits of exploiting known data dependencies and distribution constraints for Neurosymbolic use cases and argue that the expressivity required for this knowledge has implications for the design of underlying ML routines. This opens a new research agenda with general questions about Neurosymbolic background knowledge and the expressivity required of its logic.
[11] arXiv:2404.19721 [pdf, ps, other]: Title: PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games

Authors: Steph Buongiorno, Lawrence Jake Klinkert, Tanishq Chawla, Zixin Zhuang, Corey Clark

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This research introduces Procedural Artificial Narrative using Generative AI (PANGeA), a structured approach for leveraging large language models (LLMs), guided by a game designer's high-level criteria, to generate narrative content for turn-based role-playing video games (RPGs). Distinct from prior applications of LLMs used for video game design, PANGeA innovates by not only generating game level data (which includes, but is not limited to, setting, key items, and non-playable characters (NPCs)), but by also fostering dynamic, free-form interactions between the player and the environment that align with the procedural game narrative. The NPCs generated by PANGeA are personality-biased and express traits from the Big 5 Personality Model in their generated responses. PANGeA addresses challenges behind ingesting free-form text input, which can prompt LLM responses beyond the scope of the game narrative. A novel validation system that uses the LLM's intelligence evaluates text input and aligns generated responses with the unfolding narrative. Making these interactions possible, PANGeA is supported by a server that hosts a custom memory system that supplies context for augmenting generated responses thus aligning them with the procedural narrative. For its broad application, the server has a REST interface enabling any game engine to integrate directly with PANGeA, as well as an LLM interface adaptable with local or private LLMs. PANGeA's ability to foster dynamic narrative generation by aligning responses with the procedural narrative is demonstrated through an empirical study and ablation test of two versions of a demo game. These are, a custom, browser-based GPT and a Unity demo. As the results show, PANGeA holds potential to assist game designers in using LLMs to generate narrative-consistent content even when provided varied and unpredictable, free-form text input.

Cross-lists for Wed, 1 May 24

[12] arXiv:2404.18940 (cross-list from cs.SI) [pdf, ps, other]: Title: Conceptual Mapping of Controversies

Authors: Claude Draude, Dominik Dürrschnabel, Johannes Hirth, Viktoria Horn, Jonathan Kropf, Jörn Lamla, Gerd Stumme, Markus Uhlmann

Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)

With our work, we contribute towards a qualitative analysis of the discourse on controversies in online news media. For this, we employ Formal Concept Analysis and the economics of conventions to derive conceptual controversy maps. In our experiments, we analyze two maps from different news journals with methods from ordinal data science. We show how these methods can be used to assess the diversity, complexity and potential bias of controversies. In addition to that, we discuss how the diagrams of concept lattices can be used to navigate between news articles.
[13] arXiv:2404.18942 (cross-list from cs.CL) [pdf, other]: Title: GuideWalk -- Heterogeneous Data Fusion for Enhanced Learning -- A Multiclass Document Classification Case

Authors: Sarmad N. Mohammed, Semra Gündüç

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

One of the prime problems of computer science and machine learning is to extract information efficiently from large-scale, heterogeneous data. Text data, with its syntax, semantics, and even hidden information content, possesses an exceptional place among the data types in concern. The processing of the text data requires embedding, a method of translating the content of the text to numeric vectors. A correct embedding algorithm is the starting point for obtaining the full information content of the text data. In this work, a new embedding method based on the graph structure of the meaningful sentences is proposed. The design of the algorithm aims to construct an embedding vector that constitutes syntactic and semantic elements as well as the hidden content of the text data. The success of the proposed embedding method is tested in classification problems. Among the wide range of application areas, text classification is the best laboratory for embedding methods; the classification power of the method can be tested using dimensional reduction without any further processing. Furthermore, the method can be compared with different embedding algorithms and machine learning methods. The proposed method is tested with real-world data sets and eight well-known and successful embedding algorithms. The proposed embedding method shows significantly better classification for binary and multiclass datasets compared to well-known algorithms.
[14] arXiv:2404.18943 (cross-list from cs.HC) [pdf, ps, other]: Title: Using artificial intelligence methods for the studyed visual analyzer

Authors: A.I. Medvedeva, M.V.Kholod

Comments: in Rusian language

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

The paper describes how various techniques for applying artificial intelligence to the study of human eyes are utilized. The first dataset was collected using computerized perimetry to investigate the visualization of the human visual field and the diagnosis of glaucoma. A method to analyze the image using software tools is proposed. The second dataset was obtained, as part of the implementation of a Russian-Swiss experiment to collect and analyze eye movement data using the Tobii Pro Glasses 3 device on VR video. Eye movements and focus on the recorded route of a virtual journey through the canton of Vaud were investigated. Methods are being developed to investigate the dependencies of eye pupil movements using mathematical modelling. VR-video users can use these studies in medicine to assess the course and deterioration of glaucoma patients and to study the mechanisms of attention to tourist attractions.
[15] arXiv:2404.18947 (cross-list from cs.LG) [pdf, other]: Title: Multimodal Fusion on Low-quality Data: A Comprehensive Survey

Authors: Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, Qinghua Hu, Cai Xu, Jie Wen, Di Hu, Changqing Zhang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis. However, the reliability of multimodal fusion remains largely unexplored especially under low-quality data settings. This paper surveys the common challenges and recent advances of multimodal fusion in the wild and presents them in a comprehensive taxonomy. From a data-centric view, we identify four main challenges that are faced by multimodal fusion on low-quality data, namely (1) noisy multimodal data that are contaminated with heterogeneous noises, (2) incomplete multimodal data that some modalities are missing, (3) imbalanced multimodal data that the qualities or properties of different modalities are significantly different and (4) quality-varying multimodal data that the quality of each modality dynamically changes with respect to different samples. This new taxonomy will enable researchers to understand the state of the field and identify several potential directions. We also provide discussion for the open problems in this field together with interesting future research directions.
[16] arXiv:2404.18952 (cross-list from cs.CV) [pdf, other]: Title: CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention

Authors: Damith Chamalke Senadeera, Xiaoyun Yang, Dimitrios Kollias, Gregory Slabaugh

Comments: To be published in the proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper we introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance. As surveillance systems become more prevalent due to technological advances and decreasing costs, the challenge of efficiently monitoring vast amounts of video data has intensified. CUE-Net addresses this challenge by combining spatial Cropping with an enhanced version of the UniformerV2 architecture, integrating convolutional and self-attention mechanisms alongside a novel Modified Efficient Additive Attention mechanism (which reduces the quadratic time complexity of self-attention) to effectively and efficiently identify violent activities. This approach aims to overcome traditional challenges such as capturing distant or partially obscured subjects within video frames. By focusing on both local and global spatiotemporal features, CUE-Net achieves state-of-the-art performance on the RWF-2000 and RLVS datasets, surpassing existing methods.
[17] arXiv:2404.18955 (cross-list from cs.NE) [pdf, other]: Title: GARA: A novel approach to Improve Genetic Algorithms' Accuracy and Efficiency by Utilizing Relationships among Genes

Authors: Zhaoning Shi, Meng Xiang, Zhaoyang Hai, Xiabi Liu, Yan Pei

Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Genetic algorithms have played an important role in engineering optimization. Traditional GAs treat each gene separately. However, biophysical studies of gene regulatory networks revealed direct associations between different genes. It inspires us to propose an improvement to GA in this paper, Gene Regulatory Genetic Algorithm (GRGA), which, to our best knowledge, is the first time to utilize relationships among genes for improving GA's accuracy and efficiency. We design a directed multipartite graph encapsulating the solution space, called RGGR, where each node corresponds to a gene in the solution and the edge represents the relationship between adjacent nodes. The edge's weight reflects the relationship degree and is updated based on the idea that the edges' weights in a complete chain as candidate solution with acceptable or unacceptable performance should be strengthened or reduced, respectively. The obtained RGGR is then employed to determine appropriate loci of crossover and mutation operators, thereby directing the evolutionary process toward faster and better convergence. We analyze and validate our proposed GRGA approach in a single-objective multimodal optimization problem, and further test it on three types of applications, including feature selection, text summarization, and dimensionality reduction. Results illustrate that our GARA is effective and promising.
[18] arXiv:2404.18961 (cross-list from cs.LG) [pdf, other]: Title: Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras

Authors: Jun Yu, Yutong Dai, Xiaokang Liu, Jin Huang, Yishan Shen, Ke Zhang, Rong Zhou, Eashan Adhikarla, Wenxuan Ye, Yixin Liu, Zhaoming Kong, Kai Zhang, Yilong Yin, Vinod Namboodiri, Brian D. Davison, Jason H. Moore, Yong Chen

Comments: 60 figures, 116 pages, 500+ references

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

MTL is a learning paradigm that effectively leverages both task-specific and shared information to address multiple related tasks simultaneously. In contrast to STL, MTL offers a suite of benefits that enhance both the training process and the inference efficiency. MTL's key advantages encompass streamlined model architecture, performance enhancement, and cross-domain generalizability. Over the past twenty years, MTL has become widely recognized as a flexible and effective approach in various fields, including CV, NLP, recommendation systems, disease prognosis and diagnosis, and robotics. This survey provides a comprehensive overview of the evolution of MTL, encompassing the technical aspects of cutting-edge methods from traditional approaches to deep learning and the latest trend of pretrained foundation models. Our survey methodically categorizes MTL techniques into five key areas: regularization, relationship learning, feature propagation, optimization, and pre-training. This categorization not only chronologically outlines the development of MTL but also dives into various specialized strategies within each category. Furthermore, the survey reveals how the MTL evolves from handling a fixed set of tasks to embracing a more flexible approach free from task or modality constraints. It explores the concepts of task-promptable and -agnostic training, along with the capacity for ZSL, which unleashes the untapped potential of this historically coveted learning paradigm. Overall, we hope this survey provides the research community with a comprehensive overview of the advancements in MTL from its inception in 1997 to the present in 2023. We address present challenges and look ahead to future possibilities, shedding light on the opportunities and potential avenues for MTL research in a broad manner. This project is publicly available at https://github.com/junfish/Awesome-Multitask-Learning.
[19] arXiv:2404.18975 (cross-list from cs.LG) [pdf, ps, other]: Title: M3H: Multimodal Multitask Machine Learning for Healthcare

Authors: Dimitris Bertsimas, Yu Ma

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent breakthroughs in AI are poised to fundamentally enhance our study and understanding of healthcare. The development of an integrated many-to-many framework that leverages multiple data modality inputs for the analytical modeling of multiple medical tasks, is critical for a unified understanding of modern medicine. In this work, we introduce M3H, an explainable Multimodal Multitask Machine Learning for Healthcare framework that consolidates learning from diverse multimodal inputs across a broad spectrum of medical task categories and machine learning problem classes. The modular design of the framework ensures its generalizable data processing, task definition, and rapid model prototyping, applicable to both clinical and operational healthcare settings. We evaluate the M3H framework by validating models trained from four modalities (tabular, time-series, language, and vision) on 41 medical tasks across 4 machine learning problem classes. Our results demonstrate that M3H consistently produces multitask models that outperform canonical single-task models (by 1.1- 37.2%) across 37 disease diagnoses from 16 medical departments, three hospital operation forecasts, and one patient phenotyping task: spanning ML problem classes of supervised binary classification, multiclass classification, regression, and clustering. Additionally, the framework introduces a novel attention mechanism to balance self-exploitation (focus on learning source task), and cross-exploration (encourage learning from other tasks). Furthermore, M3H provides explainability insights on how joint learning of additional tasks impacts the learning of source task using a proposed TIM score, shedding light into the dynamics of task interdependencies. Its adaptable architecture facilitates the customization and integration, establishing it as a robust and scalable candidate solution for future AI-driven healthcare systems.
[20] arXiv:2404.18976 (cross-list from cs.LG) [pdf, other]: Title: Foundations of Multisensory Artificial Intelligence

Authors: Paul Pu Liang

Comments: CMU Machine Learning Department PhD Thesis

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Building multisensory AI systems that learn from multiple sensory inputs such as text, speech, video, real-world sensors, wearable devices, and medical data holds great promise for impact in many scientific areas with practical benefits, such as in supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. By synthesizing a range of theoretical frameworks and application domains, this thesis aims to advance the machine learning foundations of multisensory AI. In the first part, we present a theoretical framework formalizing how modalities interact with each other to give rise to new information for a task. These interactions are the basic building blocks in all multimodal problems, and their quantification enables users to understand their multimodal datasets, design principled approaches to learn these interactions, and analyze whether their model has succeeded in learning. In the second part, we study the design of practical multimodal foundation models that generalize over many modalities and tasks, which presents a step toward grounding large language models to real-world sensory modalities. We introduce MultiBench, a unified large-scale benchmark across a wide range of modalities, tasks, and research areas, followed by the cross-modal attention and multimodal transformer architectures that now underpin many of today's multimodal foundation models. Scaling these architectures on MultiBench enables the creation of general-purpose multisensory AI systems, and we discuss our collaborative efforts in applying these models for real-world impact in affective computing, mental health, cancer prognosis, and robotics. Finally, we conclude this thesis by discussing how future work can leverage these ideas toward more general, interactive, and safe multisensory AI.
[21] arXiv:2404.18978 (cross-list from cs.LG) [pdf, other]: Title: Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs

Authors: Bahar Radmehr, Adish Singla, Tanja Käser

Comments: Accepted as a full paper at EDM 2024: The 17th International Conference on Educational Data Mining, 14-17 of July 2024, Atlanta

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

There has been a growing interest in developing learner models to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the agent's ability to generalize skills across tasks. In this paper, we aim to enhance the generalization capabilities of agents in open-ended text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs). We investigate three types of agents: (i) RL-based agents that utilize natural language for state and action representations to find the best interaction strategy, (ii) LLM-based agents that leverage the model's general knowledge and reasoning through prompting, and (iii) hybrid LLM-assisted RL agents that combine these two strategies to improve agents' performance and generalization. To support the development and evaluation of these agents, we introduce PharmaSimText, a novel benchmark derived from the PharmaSim virtual pharmacy environment designed for practicing diagnostic conversations. Our results show that RL-based agents excel in task completion but lack in asking quality diagnostic questions. In contrast, LLM-based agents perform better in asking diagnostic questions but fall short of completing the task. Finally, hybrid LLM-assisted RL agents enable us to overcome these limitations, highlighting the potential of combining RL and LLMs to develop high-performing agents for open-ended learning environments.
[22] arXiv:2404.18981 (cross-list from eess.IV) [pdf, other]: Title: Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis

Authors: Akash Awasthi, Safwan Ahmad, Bryant Le, Hien Van Nguyen

Comments: Accepted in ISBI 2024

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)

In the realm of chest X-ray (CXR) image analysis, radiologists meticulously examine various regions, documenting their observations in reports. The prevalence of errors in CXR diagnoses, particularly among inexperienced radiologists and hospital residents, underscores the importance of understanding radiologists' intentions and the corresponding regions of interest. This understanding is crucial for correcting mistakes by guiding radiologists to the accurate regions of interest, especially in the diagnosis of chest radiograph abnormalities. In response to this imperative, we propose a novel system designed to identify the primary intentions articulated by radiologists in their reports and the corresponding regions of interest in CXR images. This system seeks to elucidate the visual context underlying radiologists' textual findings, with the potential to rectify errors made by less experienced practitioners and direct them to precise regions of interest. Importantly, the proposed system can be instrumental in providing constructive feedback to inexperienced radiologists or junior residents in the hospital, bridging the gap in face-to-face communication. The system represents a valuable tool for enhancing diagnostic accuracy and fostering continuous learning within the medical community.
[23] arXiv:2404.19007 (cross-list from cs.CL) [pdf, other]: Title: How Did We Get Here? Summarizing Conversation Dynamics

Authors: Yilun Hua, Nicholas Chernogor, Yuzhe Gu, Seoyeon Julie Jeong, Miranda Luo, Cristian Danescu-Niculescu-Mizil

Comments: To appear in the Proceedings of NAACL 2024. Data available in ConvoKit this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Throughout a conversation, the way participants interact with each other is in constant flux: their tones may change, they may resort to different strategies to convey their points, or they might alter their interaction patterns. An understanding of these dynamics can complement that of the actual facts and opinions discussed, offering a more holistic view of the trajectory of the conversation: how it arrived at its current state and where it is likely heading.
In this work, we introduce the task of summarizing the dynamics of conversations, by constructing a dataset of human-written summaries, and exploring several automated baselines. We evaluate whether such summaries can capture the trajectory of conversations via an established downstream task: forecasting whether an ongoing conversation will eventually derail into toxic behavior. We show that they help both humans and automated systems with this forecasting task. Humans make predictions three times faster, and with greater confidence, when reading the summaries than when reading the transcripts. Furthermore, automated forecasting systems are more accurate when constructing, and then predicting based on, summaries of conversation dynamics, compared to directly predicting on the transcripts.
[24] arXiv:2404.19031 (cross-list from cs.CV) [pdf, other]: Title: Machine Unlearning for Document Classification

Authors: Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

Comments: Accepted to ICDAR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating ``the right to be forgotten" has recently been proposed, allowing users to request the removal of private information from computer systems and neural network models. A novel approach, known as machine unlearning, has emerged to make AI models forget about a particular class of data. In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area. Specifically, we consider a realistic scenario where a remote server houses a well-trained model and possesses only a small portion of training data. This setup is designed for efficient forgetting manipulation. This work represents a pioneering step towards the development of machine unlearning methods aimed at addressing privacy concerns in document analysis applications. Our code is publicly available at \url{https://github.com/leitro/MachineUnlearning-DocClassification}.
[25] arXiv:2404.19038 (cross-list from cs.CV) [pdf, other]: Title: Embedded Representation Learning Network for Animating Styled Video Portrait

Authors: Tianyong Wang, Xiangyu Liang, Wangguandong Zheng, Dan Niu, Haifeng Xia, Siyu Xia

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The talking head generation recently attracted considerable attention due to its widespread application prospects, especially for digital avatars and 3D animation design. Inspired by this practical demand, several works explored Neural Radiance Fields (NeRF) to synthesize the talking heads. However, these methods based on NeRF face two challenges: (1) Difficulty in generating style-controllable talking heads. (2) Displacement artifacts around the neck in rendered images. To overcome these two challenges, we propose a novel generative paradigm \textit{Embedded Representation Learning Network} (ERLNet) with two learning stages. First, the \textit{ audio-driven FLAME} (ADF) module is constructed to produce facial expression and head pose sequences synchronized with content audio and style video. Second, given the sequence deduced by the ADF, one novel \textit{dual-branch fusion NeRF} (DBF-NeRF) explores these contents to render the final images. Extensive empirical studies demonstrate that the collaboration of these two stages effectively facilitates our method to render a more realistic talking head than the existing algorithms.
[26] arXiv:2404.19048 (cross-list from cs.CL) [pdf, other]: Title: A Framework for Real-time Safeguarding the Text Generation of Large Language

Authors: Ximing Dong, Dayi Lin, Shaowei Wang, Ahmed E. Hassan

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have significantly advanced natural language processing (NLP) tasks but also pose ethical and societal risks due to their propensity to generate harmful content. To address this, various approaches have been developed to safeguard LLMs from producing unsafe content. However, existing methods have limitations, including the need for training specific control models and proactive intervention during text generation, that lead to quality degradation and increased computational overhead. To mitigate those limitations, we propose LLMSafeGuard, a lightweight framework to safeguard LLM text generation in real-time. LLMSafeGuard integrates an external validator into the beam search algorithm during decoding, rejecting candidates that violate safety constraints while allowing valid ones to proceed. We introduce a similarity based validation approach, simplifying constraint introduction and eliminating the need for control model training. Additionally, LLMSafeGuard employs a context-wise timing selection strategy, intervening LLMs only when necessary. We evaluate LLMSafe-Guard on two tasks, detoxification and copyright safeguarding, and demonstrate its superior performance over SOTA baselines. For instance, LLMSafeGuard reduces the average toxic score of. LLM output by 29.7% compared to the best baseline meanwhile preserving similar linguistic quality as natural output in detoxification task. Similarly, in the copyright task, LLMSafeGuard decreases the Longest Common Subsequence (LCS) by 56.2% compared to baselines. Moreover, our context-wise timing selection strategy reduces inference time by at least 24% meanwhile maintaining comparable effectiveness as validating each time step. LLMSafeGuard also offers tunable parameters to balance its effectiveness and efficiency.
[27] arXiv:2404.19075 (cross-list from eess.IV) [pdf, other]: Title: Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction

Authors: K. Aditya Mohan, Massimiliano Ferrucci, Chuck Divin, Garrett A. Stevenson, Hyojin Kim

Comments: submitted to Nature Machine Intelligence

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Numerical Analysis (math.NA)

4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in-situ experiments that causes spurious artifacts and inaccurate morphological reconstructions of the object. To solve this problem, we propose to perform a 4D time-space reconstruction using a distributed implicit neural representation (DINR) network that is trained using a novel distributed stochastic training algorithm. Our DINR network learns to reconstruct the object at its output by iterative optimization of its network parameters such that the measured projection images best match the output of the CT forward measurement model. We use a continuous time and space forward measurement model that is a function of the DINR outputs at a sparsely sampled set of continuous valued object coordinates. Unlike existing state-of-the-art neural representation architectures that forward and back propagate through dense voxel grids that sample the object's entire time-space coordinates, we only propagate through the DINR at a small subset of object coordinates in each iteration resulting in an order-of-magnitude reduction in memory and compute for training. DINR leverages distributed computation across several compute nodes and GPUs to produce high-fidelity 4D time-space reconstructions even for extremely large CT data sizes. We use both simulated parallel-beam and experimental cone-beam X-ray CT datasets to demonstrate the superior performance of our approach.
[28] arXiv:2404.19076 (cross-list from cs.CY) [pdf, ps, other]: Title: Who Followed the Blueprint? Analyzing the Responses of U.S. Federal Agencies to the Blueprint for an AI Bill of Rights

Authors: Darren Lage, Riley Pruitt, Jason Ross Arnold

Comments: 8 pages

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

This study examines the extent to which U.S. federal agencies responded to and implemented the principles outlined in the White House's October 2022 "Blueprint for an AI Bill of Rights." The Blueprint provided a framework for the ethical governance of artificial intelligence systems, organized around five core principles: safety and effectiveness, protection against algorithmic discrimination, data privacy, notice and explanation about AI systems, and human alternatives and fallback.
Through an analysis of publicly available records across 15 federal departments, the authors found limited evidence that the Blueprint directly influenced agency actions after its release. Only five departments explicitly mentioned the Blueprint, while 12 took steps aligned with one or more of its principles. However, much of this work appeared to have precedents predating the Blueprint or motivations disconnected from it, such as compliance with prior executive orders on trustworthy AI. Departments' activities often emphasized priorities like safety, accountability and transparency that overlapped with Blueprint principles, but did not necessarily stem from it.
The authors conclude that the non-binding Blueprint seems to have had minimal impact on shaping the U.S. government's approach to ethical AI governance in its first year. Factors like public concerns after high-profile AI releases and obligations to follow direct executive orders likely carried more influence over federal agencies. More rigorous study would be needed to definitively assess the Blueprint's effects within the federal bureaucracy and broader society.
[29] arXiv:2404.19087 (cross-list from cs.RO) [pdf, other]: Title: Deep Reinforcement Learning for Advanced Longitudinal Control and Collision Avoidance in High-Risk Driving Scenarios

Authors: Dianwei Chen, Yaobang Gong, Xianfeng Yang

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)

Existing Advanced Driver Assistance Systems primarily focus on the vehicle directly ahead, often overlooking potential risks from following vehicles. This oversight can lead to ineffective handling of high risk situations, such as high speed, closely spaced, multi vehicle scenarios where emergency braking by one vehicle might trigger a pile up collision. To overcome these limitations, this study introduces a novel deep reinforcement learning based algorithm for longitudinal control and collision avoidance. This proposed algorithm effectively considers the behavior of both leading and following vehicles. Its implementation in simulated high risk scenarios, which involve emergency braking in dense traffic where traditional systems typically fail, has demonstrated the algorithm ability to prevent potential pile up collisions, including those involving heavy duty vehicles.
[30] arXiv:2404.19093 (cross-list from cs.IR) [pdf, other]: Title: Large Language Models as Conversational Movie Recommenders: A User Study

Authors: Ruixuan Sun, Xinyi Li, Avinash Akella, Joseph A. Konstan

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

This paper explores the effectiveness of using large language models (LLMs) for personalized movie recommendations from users' perspectives in an online field experiment. Our study involves a combination of between-subject prompt and historic consumption assessments, along with within-subject recommendation scenario evaluations. By examining conversation and survey response data from 160 active users, we find that LLMs offer strong recommendation explainability but lack overall personalization, diversity, and user trust. Our results also indicate that different personalized prompting techniques do not significantly affect user-perceived recommendation quality, but the number of movies a user has watched plays a more significant role. Furthermore, LLMs show a greater ability to recommend lesser-known or niche movies. Through qualitative analysis, we identify key conversational patterns linked to positive and negative user interaction experiences and conclude that providing personal context and examples is crucial for obtaining high-quality recommendations from LLMs.
[31] arXiv:2404.19100 (cross-list from cs.SE) [pdf, other]: Title: Predicting Fairness of ML Software Configuration

Authors: Salvador Robles Herrera, Verya Monjezi, Vladik Kreinovich, Ashutosh Trivedi, Saeid Tizpaz-Niari

Comments: To Appear in the 20th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE'24)

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

This paper investigates the relationships between hyperparameters of machine learning and fairness. Data-driven solutions are increasingly used in critical socio-technical applications where ensuring fairness is important. Rather than explicitly encoding decision logic via control and data structures, the ML developers provide input data, perform some pre-processing, choose ML algorithms, and tune hyperparameters (HPs) to infer a program that encodes the decision logic. Prior works report that the selection of HPs can significantly influence fairness. However, tuning HPs to find an ideal trade-off between accuracy, precision, and fairness has remained an expensive and tedious task. Can we predict fairness of HP configuration for a given dataset? Are the predictions robust to distribution shifts?
We focus on group fairness notions and investigate the HP space of 5 training algorithms. We first find that tree regressors and XGBoots significantly outperformed deep neural networks and support vector machines in accurately predicting the fairness of HPs. When predicting the fairness of ML hyperparameters under temporal distribution shift, the tree regressors outperforms the other algorithms with reasonable accuracy. However, the precision depends on the ML training algorithm, dataset, and protected attributes. For example, the tree regressor model was robust for training data shift from 2014 to 2018 on logistic regression and discriminant analysis HPs with sex as the protected attribute; but not for race and other training algorithms. Our method provides a sound framework to efficiently perform fine-tuning of ML training algorithms and understand the relationships between HPs and fairness.
[32] arXiv:2404.19130 (cross-list from cs.IR) [pdf, other]: Title: SpherE: Expressive and Interpretable Knowledge Graph Embedding for Set Retrieval

Authors: Zihao Li, Yuyi Ao, Jingrui He

Comments: Accepted by SIGIR 2024, Camera Ready Version

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Knowledge graphs (KGs), which store an extensive number of relational facts (head, relation, tail), serve various applications. While many downstream tasks highly rely on the expressive modeling and predictive embedding of KGs, most of the current KG representation learning methods, where each entity is embedded as a vector in the Euclidean space and each relation is embedded as a transformation, follow an entity ranking protocol. On one hand, such an embedding design cannot capture many-to-many relations. On the other hand, in many retrieval cases, the users wish to get an exact set of answers without any ranking, especially when the results are expected to be precise, e.g., which genes cause an illness. Such scenarios are commonly referred to as "set retrieval". This work presents a pioneering study on the KG set retrieval problem. We show that the set retrieval highly depends on expressive modeling of many-to-many relations, and propose a new KG embedding model SpherE to address this problem. SpherE is based on rotational embedding methods, but each entity is embedded as a sphere instead of a vector. While inheriting the high interpretability of rotational-based models, our SpherE can more expressively model one-to-many, many-to-one, and many-to-many relations. Through extensive experiments, we show that our SpherE can well address the set retrieval problem while still having a good predictive ability to infer missing facts. The code is available at https://github.com/Violet24K/SpherE.
[33] arXiv:2404.19171 (cross-list from cs.CV) [pdf, other]: Title: Explicit Correlation Learning for Generalizable Cross-Modal Deepfake Detection

Authors: Cai Yu, Shan Jia, Xiaomeng Fu, Jin Liu, Jiahe Tian, Jiao Dai, Xi Wang, Siwei Lyu, Jizhong Han

Comments: accepted by ICME 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

With the rising prevalence of deepfakes, there is a growing interest in developing generalizable detection methods for various types of deepfakes. While effective in their specific modalities, traditional detection methods fall short in addressing the generalizability of detection across diverse cross-modal deepfakes. This paper aims to explicitly learn potential cross-modal correlation to enhance deepfake detection towards various generation scenarios. Our approach introduces a correlation distillation task, which models the inherent cross-modal correlation based on content information. This strategy helps to prevent the model from overfitting merely to audio-visual synchronization. Additionally, we present the Cross-Modal Deepfake Dataset (CMDFD), a comprehensive dataset with four generation methods to evaluate the detection of diverse cross-modal deepfakes. The experimental results on CMDFD and FakeAVCeleb datasets demonstrate the superior generalizability of our method over existing state-of-the-art methods. Our code and data can be found at \url{https://github.com/ljj898/CMDFD-Dataset-and-Deepfake-Detection}.
[34] arXiv:2404.19192 (cross-list from cs.CL) [pdf, other]: Title: Mix of Experts Language Model for Named Entity Recognition

Authors: Xinwei Chen, Kun Li, Tianyou Song, Jiangjian Guo

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Named Entity Recognition (NER) is an essential steppingstone in the field of natural language processing. Although promising performance has been achieved by various distantly supervised models, we argue that distant supervision inevitably introduces incomplete and noisy annotations, which may mislead the model training process. To address this issue, we propose a robust NER model named BOND-MoE based on Mixture of Experts (MoE). Instead of relying on a single model for NER prediction, multiple models are trained and ensembled under the Expectation-Maximization (EM) framework, so that noisy supervision can be dramatically alleviated. In addition, we introduce a fair assignment module to balance the document-model assignment process. Extensive experiments on real-world datasets show that the proposed method achieves state-of-the-art performance compared with other distantly supervised NER.
[35] arXiv:2404.19204 (cross-list from cs.CV) [pdf, other]: Title: NeRF-Insert: 3D Local Editing with Multimodal Control Signals

Authors: Benet Oriol Sabat, Alessandro Achille, Matthew Trager, Stefano Soatto

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)

We propose NeRF-Insert, a NeRF editing framework that allows users to make high-quality local edits with a flexible level of control. Unlike previous work that relied on image-to-image models, we cast scene editing as an in-painting problem, which encourages the global structure of the scene to be preserved. Moreover, while most existing methods use only textual prompts to condition edits, our framework accepts a combination of inputs of different modalities as reference. More precisely, a user may provide a combination of textual and visual inputs including images, CAD models, and binary image masks for specifying a 3D region. We use generic image generation models to in-paint the scene from multiple viewpoints, and lift the local edits to a 3D-consistent NeRF edit. Compared to previous methods, our results show better visual quality and also maintain stronger consistency with the original NeRF.
[36] arXiv:2404.19205 (cross-list from cs.CV) [pdf, other]: Title: TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

Authors: Yoonsik Kim, Moonbin Yim, Ka Yeon Song

Comments: Technical Report

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In this paper, we establish a benchmark for table visual question answering, referred to as the TableVQA-Bench, derived from pre-existing table question-answering (QA) and table structure recognition datasets. It is important to note that existing datasets have not incorporated images or QA pairs, which are two crucial components of TableVQA. As such, the primary objective of this paper is to obtain these necessary components. Specifically, images are sourced either through the application of a \textit{stylesheet} or by employing the proposed table rendering system. QA pairs are generated by exploiting the large language model (LLM) where the input is a text-formatted table. Ultimately, the completed TableVQA-Bench comprises 1,500 QA pairs. We comprehensively compare the performance of various multi-modal large language models (MLLMs) on TableVQA-Bench. GPT-4V achieves the highest accuracy among commercial and open-sourced MLLMs from our experiments. Moreover, we discover that the number of vision queries plays a significant role in TableVQA performance. To further analyze the capabilities of MLLMs in comparison to their LLM backbones, we investigate by presenting image-formatted tables to MLLMs and text-formatted tables to LLMs, respectively. Our findings suggest that processing visual inputs is more challenging than text inputs, as evidenced by the lower performance of MLLMs, despite generally requiring higher computational costs than LLMs. The proposed TableVQA-Bench and evaluation codes are available at \href{https://github.com/naver-ai/tablevqabench}{https://github.com/naver-ai/tablevqabench}.
[37] arXiv:2404.19230 (cross-list from q-bio.BM) [pdf, ps, other]: Title: Deep Lead Optimization: Leveraging Generative AI for Structural Modification

Authors: Odin Zhang, Haitao Lin, Hui Zhang, Huifeng Zhao, Yufei Huang, Yuansheng Huang, Dejun Jiang, Chang-yu Hsieh, Peichen Pan, Tingjun Hou

Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)

The idea of using deep-learning-based molecular generation to accelerate discovery of drug candidates has attracted extraordinary attention, and many deep generative models have been developed for automated drug design, termed molecular generation. In general, molecular generation encompasses two main strategies: de novo design, which generates novel molecular structures from scratch, and lead optimization, which refines existing molecules into drug candidates. Among them, lead optimization plays an important role in real-world drug design. For example, it can enable the development of me-better drugs that are chemically distinct yet more effective than the original drugs. It can also facilitate fragment-based drug design, transforming virtual-screened small ligands with low affinity into first-in-class medicines. Despite its importance, automated lead optimization remains underexplored compared to the well-established de novo generative models, due to its reliance on complex biological and chemical knowledge. To bridge this gap, we conduct a systematic review of traditional computational methods for lead optimization, organizing these strategies into four principal sub-tasks with defined inputs and outputs. This review delves into the basic concepts, goals, conventional CADD techniques, and recent advancements in AIDD. Additionally, we introduce a unified perspective based on constrained subgraph generation to harmonize the methodologies of de novo design and lead optimization. Through this lens, de novo design can incorporate strategies from lead optimization to address the challenge of generating hard-to-synthesize molecules; inversely, lead optimization can benefit from the innovations in de novo design by approaching it as a task of generating molecules conditioned on certain substructures.
[38] arXiv:2404.19232 (cross-list from cs.CL) [pdf, other]: Title: GRAMMAR: Grounded and Modular Evaluation of Domain-Specific Retrieval-Augmented Language Models

Authors: Xinzhe Li, Ming Liu, Shang Gao

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem from knowledge deficits or issues related to system robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising two key elements: 1) a data generation process that leverages relational databases and LLMs to efficiently produce scalable query-answer pairs. This method facilitates the separation of query logic from linguistic variations for enhanced debugging capabilities; and 2) an evaluation framework that differentiates knowledge gaps from robustness and enables the identification of defective modules. Our empirical results underscore the limitations of current reference-free evaluation approaches and the reliability of GRAMMAR to accurately identify model vulnerabilities.
[39] arXiv:2404.19244 (cross-list from cs.CY) [pdf, other]: Title: A University Framework for the Responsible use of Generative AI in Research

Authors: Shannon Smith, Melissa Tate, Keri Freeman, Anne Walsh, Brian Ballsun-Stanton, Mark Hooper, Murray Lane

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Generative Artificial Intelligence (generative AI) poses both opportunities and risks for the integrity of research. Universities must guide researchers in using generative AI responsibly, and in navigating a complex regulatory landscape subject to rapid change. By drawing on the experiences of two Australian universities, we propose a framework to help institutions promote and facilitate the responsible use of generative AI. We provide guidance to help distil the diverse regulatory environment into a principles-based position statement. Further, we explain how a position statement can then serve as a foundation for initiatives in training, communications, infrastructure, and process change. Despite the growing body of literature about AI's impact on academic integrity for undergraduate students, there has been comparatively little attention on the impacts of generative AI for research integrity, and the vital role of institutions in helping to address those challenges. This paper underscores the urgency for research institutions to take action in this area and suggests a practical and adaptable framework for so doing.
[40] arXiv:2404.19245 (cross-list from cs.CL) [pdf, other]: Title: HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

Authors: Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, Chengzhong Xu

Comments: 19 pages, 7 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Adapting Large Language Models (LLMs) to new tasks through fine-tuning has been made more efficient by the introduction of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA. However, these methods often underperform compared to full fine-tuning, particularly in scenarios involving complex datasets. This issue becomes even more pronounced in complex domains, highlighting the need for improved PEFT approaches that can achieve better performance. Through a series of experiments, we have uncovered two critical insights that shed light on the training and parameter inefficiency of LoRA. Building on these insights, we have developed HydraLoRA, a LoRA framework with an asymmetric structure that eliminates the need for domain expertise. Our experiments demonstrate that HydraLoRA outperforms other PEFT approaches, even those that rely on domain knowledge during the training and inference phases. \href{https://github.com/Clin0212/HydraLoRA}{Code}.
[41] arXiv:2404.19253 (cross-list from cs.RO) [pdf, other]: Title: Learning to Communicate Functional States with Nonverbal Expressions for Improved Human-Robot Collaboration

Authors: Liam Roy, Dana Kulic, Elizabeth Croft

Comments: 8 Pages, Accepted to RA-L March 2024

Journal-ref: LRA.2024.3384037

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Collaborative robots must effectively communicate their internal state to humans to enable a smooth interaction. Nonverbal communication is widely used to communicate information during human-robot interaction, however, such methods may also be misunderstood, leading to communication errors. In this work, we explore modulating the acoustic parameter values (pitch bend, beats per minute, beats per loop) of nonverbal auditory expressions to convey functional robot states (accomplished, progressing, stuck). We propose a reinforcement learning (RL) algorithm based on noisy human feedback to produce accurately interpreted nonverbal auditory expressions. The proposed approach was evaluated through a user study with 24 participants. The results demonstrate that: 1. Our proposed RL-based approach is able to learn suitable acoustic parameter values which improve the users' ability to correctly identify the state of the robot. 2. Algorithm initialization informed by previous user data can be used to significantly speed up the learning process. 3. The method used for algorithm initialization strongly influences whether participants converge to similar sounds for each robot state. 4. Modulation of pitch bend has the largest influence on user association between sounds and robotic states.
[42] arXiv:2404.19254 (cross-list from cs.CL) [pdf, other]: Title: Suvach -- Generated Hindi QA benchmark

Authors: Vaishak Narayanan, Prabin Raj KP, Saifudheen Nouphal

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Current evaluation benchmarks for question answering (QA) in Indic languages often rely on machine translation of existing English datasets. This approach suffers from bias and inaccuracies inherent in machine translation, leading to datasets that may not reflect the true capabilities of EQA models for Indic languages. This paper proposes a new benchmark specifically designed for evaluating Hindi EQA models and discusses the methodology to do the same for any task. This method leverages large language models (LLMs) to generate a high-quality dataset in an extractive setting, ensuring its relevance for the target language. We believe this new resource will foster advancements in Hindi NLP research by providing a more accurate and reliable evaluation tool.
[43] arXiv:2404.19288 (cross-list from cs.LG) [pdf, other]: Title: Training-free Graph Neural Networks and the Power of Labels as Features

Authors: Ryoma Sato

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose training-free graph neural networks (TFGNNs), which can be used without training and can also be improved with optional training, for transductive node classification. We first advocate labels as features (LaF), which is an admissible but not explored technique. We show that LaF provably enhances the expressive power of graph neural networks. We design TFGNNs based on this analysis. In the experiments, we confirm that TFGNNs outperform existing GNNs in the training-free setting and converge with much fewer training iterations than traditional GNNs.
[44] arXiv:2404.19306 (cross-list from cs.LG) [pdf, ps, other]: Title: Comprehensive Forecasting-Based Analysis of Hybrid and Stacked Stateful/ Stateless Models

Authors: Swayamjit Saha

Comments: 8 pages, 14 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Wind speed is a powerful source of renewable energy, which can be used as an alternative to the non-renewable resources for production of electricity. Renewable sources are clean, infinite and do not impact the environment negatively during production of electrical energy. However, while eliciting electrical energy from renewable resources viz. solar irradiance, wind speed, hydro should require special planning failing which may result in huge loss of labour and money for setting up the system. In this paper, we discuss four deep recurrent neural networks viz. Stacked Stateless LSTM, Stacked Stateless GRU, Stacked Stateful LSTM and Statcked Stateful GRU which will be used to predict wind speed on a short-term basis for the airport sites beside two campuses of Mississippi State University. The paper does a comprehensive analysis of the performance of the models used describing their architectures and how efficiently they elicit the results with the help of RMSE values. A detailed description of the time and space complexities of the above models has also been discussed.
[45] arXiv:2404.19330 (cross-list from cs.CV) [pdf, other]: Title: G2LTraj: A Global-to-Local Generation Approach for Trajectory Prediction

Authors: Zhanwei Zhang, Zishuo Hua, Minghao Chen, Wei Lu, Binbin Lin, Deng Cai, Wenxiao Wang

Comments: Accepted by IJCAI 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Predicting future trajectories of traffic agents accurately holds substantial importance in various applications such as autonomous driving. Previous methods commonly infer all future steps of an agent either recursively or simultaneously. However, the recursive strategy suffers from the accumulated error, while the simultaneous strategy overlooks the constraints among future steps, resulting in kinematically infeasible predictions. To address these issues, in this paper, we propose G2LTraj, a plug-and-play global-to-local generation approach for trajectory prediction. Specifically, we generate a series of global key steps that uniformly cover the entire future time range. Subsequently, the local intermediate steps between the adjacent key steps are recursively filled in. In this way, we prevent the accumulated error from propagating beyond the adjacent key steps. Moreover, to boost the kinematical feasibility, we not only introduce the spatial constraints among key steps but also strengthen the temporal constraints among the intermediate steps. Finally, to ensure the optimal granularity of key steps, we design a selectable granularity strategy that caters to each predicted trajectory. Our G2LTraj significantly improves the performance of seven existing trajectory predictors across the ETH, UCY and nuScenes datasets. Experimental results demonstrate its effectiveness. Code will be available at https://github.com/Zhanwei-Z/G2LTraj.
[46] arXiv:2404.19341 (cross-list from cs.CV) [pdf, other]: Title: Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs

Authors: Soham Mitra, Atri Sukul, Swalpa Kumar Roy, Pravendra Singh, Vinay Verma

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Deep learning models have achieved remarkable success across diverse domains. However, the intricate nature of these models often impedes a clear understanding of their decision-making processes. This is where Explainable AI (XAI) becomes indispensable, offering intuitive explanations for model decisions. In this work, we propose a simple yet highly effective approach, ScoreCAM++, which introduces modifications to enhance the promising ScoreCAM method for visual explainability. Our proposed approach involves altering the normalization function within the activation layer utilized in ScoreCAM, resulting in significantly improved results compared to previous efforts. Additionally, we apply an activation function to the upsampled activation layers to enhance interpretability. This improvement is achieved by selectively gating lower-priority values within the activation layer. Through extensive experiments and qualitative comparisons, we demonstrate that ScoreCAM++ consistently achieves notably superior performance and fairness in interpreting the decision-making process compared to both ScoreCAM and previous methods.
[47] arXiv:2404.19346 (cross-list from cs.LG) [pdf, other]: Title: Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning

Authors: Chenjia Bai, Lingxiao Wang, Jianye Hao, Zhuoran Yang, Bin Zhao, Zhen Wang, Xuelong Li

Comments: Accepted by Artificial Intelligence (AIJ)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi-Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from other tasks exacerbates the distribution shift in offline RL. In this paper, we propose an uncertainty-based MTDS approach that shares the entire dataset without data selection. Given ensemble-based uncertainty quantification, we perform pessimistic value iteration on the shared offline dataset, which provides a unified framework for single- and multi-task offline RL. We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing. Empirically, we release an MTDS benchmark and collect datasets from three challenging domains. The experimental results show our algorithm outperforms the previous state-of-the-art methods in challenging MTDS problems. See https://github.com/Baichenjia/UTDS for the datasets and code.
[48] arXiv:2404.19349 (cross-list from cs.RO) [pdf, other]: Title: Human-AI Interaction in Industrial Robotics: Design and Empirical Evaluation of a User Interface for Explainable AI-Based Robot Program Optimization

Authors: Benjamin Alt, Johannes Zahn, Claudius Kienle, Julia Dvorak, Marvin May, Darko Katic, Rainer Jäkel, Tobias Kopp, Michael Beetz, Gisela Lanza

Comments: 6 pages, 4 figures, accepted at the 2024 CIRP International Conference on Manufacturing Systems (CMS)

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

While recent advances in deep learning have demonstrated its transformative potential, its adoption for real-world manufacturing applications remains limited. We present an Explanation User Interface (XUI) for a state-of-the-art deep learning-based robot program optimizer which provides both naive and expert users with different user experiences depending on their skill level, as well as Explainable AI (XAI) features to facilitate the application of deep learning methods in real-world applications. To evaluate the impact of the XUI on task performance, user satisfaction and cognitive load, we present the results of a preliminary user survey and propose a study design for a large-scale follow-up study.
[49] arXiv:2404.19359 (cross-list from cs.CL) [pdf, other]: Title: Evaluating Lexicon Incorporation for Depression Symptom Estimation

Authors: Kirill Milintsevich, Gaël Dias, Kairit Sirts

Comments: Accepted to Clinical NLP workshop at NAACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper explores the impact of incorporating sentiment, emotion, and domain-specific lexicons into a transformer-based model for depression symptom estimation. Lexicon information is added by marking the words in the input transcripts of patient-therapist conversations as well as in social media posts. Overall results show that the introduction of external knowledge within pre-trained language models can be beneficial for prediction performance, while different lexicons show distinct behaviours depending on the targeted task. Additionally, new state-of-the-art results are obtained for the estimation of depression level over patient-therapist interviews.
[50] arXiv:2404.19361 (cross-list from cs.GT) [pdf, ps, other]: Title: A Negotiator's Backup Plan: Optimal Concessions with a Reservation Value

Authors: Tamara C.P. Florijn, Pinar Yolum, Tim Baarslag

Comments: Accepted at AAMAS 2024

Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

Automated negotiation is a well-known mechanism for autonomous agents to reach agreements. To realize beneficial agreements quickly, it is key to employ a good bidding strategy. When a negotiating agent has a good back-up plan, i.e., a high reservation value, failing to reach an agreement is not necessarily disadvantageous. Thus, the agent can adopt a risk-seeking strategy, aiming for outcomes with a higher utilities.
Accordingly, this paper develops an optimal bidding strategy called MIA-RVelous for bilateral negotiations with private reservation values. The proposed greedy algorithm finds the optimal bid sequence given the agent's beliefs about the opponent in $O(n^2D)$ time, with $D$ the maximum number of rounds and $n$ the number of outcomes. The results obtained here can pave the way to realizing effective concurrent negotiations, given that concurrent negotiations can serve as a (probabilistic) backup plan.
[51] arXiv:2404.19384 (cross-list from cs.CV) [pdf, other]: Title: Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection

Authors: Zhanwei Zhang, Minghao Chen, Shuai Xiao, Liang Peng, Hengjia Li, Binbin Lin, Ping Li, Wenxiao Wang, Boxi Wu, Deng Cai

Comments: Accepted by CVPR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previous techniques mitigate this by reweighting these boxes as pseudo labels, but these boxes can still poison the training process. To resolve this problem, in this paper, we propose a novel pseudo label refinery framework. Specifically, in the selection process, to improve the reliability of pseudo boxes, we propose a complementary augmentation strategy. This strategy involves either removing all points within an unreliable box or replacing it with a high-confidence box. Moreover, the point numbers of instances in high-beam datasets are considerably higher than those in low-beam datasets, also degrading the quality of pseudo labels during the training process. We alleviate this issue by generating additional proposals and aligning RoI features across different domains. Experimental results demonstrate that our method effectively enhances the quality of pseudo labels and consistently surpasses the state-of-the-art methods on six autonomous driving benchmarks. Code will be available at https://github.com/Zhanwei-Z/PERE.
[52] arXiv:2404.19403 (cross-list from cs.RO) [pdf, other]: Title: Transformer-Enhanced Motion Planner: Attention-Guided Sampling for State-Specific Decision Making

Authors: Lei Zhuang, Jingdong Zhao, Yuntao Li, Zichun Xu, Liangliang Zhao, Hong Liu

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Sampling-based motion planning (SBMP) algorithms are renowned for their robust global search capabilities. However, the inherent randomness in their sampling mechanisms often result in inconsistent path quality and limited search efficiency. In response to these challenges, this work proposes a novel deep learning-based motion planning framework, named Transformer-Enhanced Motion Planner (TEMP), which synergizes an Environmental Information Semantic Encoder (EISE) with a Motion Planning Transformer (MPT). EISE converts environmental data into semantic environmental information (SEI), providing MPT with an enriched environmental comprehension. MPT leverages an attention mechanism to dynamically recalibrate its focus on SEI, task objectives, and historical planning data, refining the sampling node generation. To demonstrate the capabilities of TEMP, we train our model using a dataset comprised of planning results produced by the RRT*. EISE and MPT are collaboratively trained, enabling EISE to autonomously learn and extract patterns from environmental data, thereby forming semantic representations that MPT could more effectively interpret and utilize for motion planning. Subsequently, we conducted a systematic evaluation of TEMP's efficacy across diverse task dimensions, which demonstrates that TEMP achieves exceptional performance metrics and a heightened degree of generalizability compared to state-of-the-art SBMPs.
[53] arXiv:2404.19432 (cross-list from cs.CL) [pdf, other]: Title: Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationships

Authors: D. Panas, S. Seth, V. Belle

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Two major areas of interest in the era of Large Language Models regard questions of what do LLMs know, and if and how they may be able to reason (or rather, approximately reason). Since to date these lines of work progressed largely in parallel (with notable exceptions), we are interested in investigating the intersection: probing for reasoning about the implicitly-held knowledge. Suspecting the performance to be lacking in this area, we use a very simple set-up of comparisons between cardinalities associated with elements of various subjects (e.g. the number of legs a bird has versus the number of wheels on a tricycle). We empirically demonstrate that although LLMs make steady progress in knowledge acquisition and (pseudo)reasoning with each new GPT release, their capabilities are limited to statistical inference only. It is difficult to argue that pure statistical learning can cope with the combinatorial explosion inherent in many commonsense reasoning tasks, especially once arithmetical notions are involved. Further, we argue that bigger is not always better and chasing purely statistical improvements is flawed at the core, since it only exacerbates the dangerous conflation of the production of correct answers with genuine reasoning ability.
[54] arXiv:2404.19456 (cross-list from cs.LG) [pdf, other]: Title: Imitation Learning: A Survey of Learning Methods, Environments and Metrics

Authors: Nathan Gavenski, Odinaldo Rodrigues, Michael Luck

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Imitation learning is an approach in which an agent learns how to execute a task by trying to mimic how one or more teachers perform it. This learning approach offers a compromise between the time it takes to learn a new task and the effort needed to collect teacher samples for the agent. It achieves this by balancing learning from the teacher, who has some information on how to perform the task, and deviating from their examples when necessary, such as states not present in the teacher samples. Consequently, the field of imitation learning has received much attention from researchers in recent years, resulting in many new methods and applications. However, with this increase in published work and past surveys focusing mainly on methodology, a lack of standardisation became more prominent in the field. This non-standardisation is evident in the use of environments, which appear in no more than two works, and evaluation processes, such as qualitative analysis, that have become rare in current literature. In this survey, we systematically review current imitation learning literature and present our findings by (i) classifying imitation learning techniques, environments and metrics by introducing novel taxonomies; (ii) reflecting on main problems from the literature; and (iii) presenting challenges and future directions for researchers.
[55] arXiv:2404.19484 (cross-list from cs.LG) [pdf, other]: Title: More Compute Is What You Need

Authors: Zhen Guo

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large language model pre-training has become increasingly expensive, with most practitioners relying on scaling laws to allocate compute budgets for model size and training tokens, commonly referred to as Compute-Optimal or Chinchilla Optimal. In this paper, we hypothesize a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models, independent of the specific allocation to model size and dataset size. Using this unified scaling law, we predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.
[56] arXiv:2404.19500 (cross-list from cs.CV) [pdf, other]: Title: Towards Real-world Video Face Restoration: A New Benchmark

Authors: Ziyan Chen, Jingwen He, Xinqi Lin, Yu Qiao, Chao Dong

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Image and Video Processing (eess.IV)

Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face images, which are limited in their coverage of real-world video frames. In this work, we introduced new real-world datasets named FOS with a taxonomy of "Full, Occluded, and Side" faces from mainly video frames to study the applicability of current methods on videos. Compared with existing test datasets, FOS datasets cover more diverse degradations and involve face samples from more complex scenarios, which helps to revisit current face restoration approaches more comprehensively. Given the established datasets, we benchmarked both the state-of-the-art BFR methods and the video super resolution (VSR) methods to comprehensively study current approaches, identifying their potential and limitations in VFR tasks. In addition, we studied the effectiveness of the commonly used image quality assessment (IQA) metrics and face IQA (FIQA) metrics by leveraging a subjective user study. With extensive experimental results and detailed analysis provided, we gained insights from the successes and failures of both current BFR and VSR methods. These results also pose challenges to current face restoration approaches, which we hope stimulate future advances in VFR research.
[57] arXiv:2404.19518 (cross-list from cs.MA) [pdf, other]: Title: MGCBS: An Optimal and Efficient Algorithm for Solving Multi-Goal Multi-Agent Path Finding Problem

Authors: Mingkai Tang, Yuanhang Li, Hongji Liu, Yingbing Chen, Ming Liu, Lujia Wang

Comments: to be published in IJCAI2024

Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Robotics (cs.RO)

With the expansion of the scale of robotics applications, the multi-goal multi-agent pathfinding (MG-MAPF) problem began to gain widespread attention. This problem requires each agent to visit pre-assigned multiple goal points at least once without conflict. Some previous methods have been proposed to solve the MG-MAPF problem based on Decoupling the goal Vertex visiting order search and the Single-agent pathfinding (DVS). However, this paper demonstrates that the methods based on DVS cannot always obtain the optimal solution. To obtain the optimal result, we propose the Multi-Goal Conflict-Based Search (MGCBS), which is based on Decoupling the goal Safe interval visiting order search and the Single-agent pathfinding (DSS). Additionally, we present the Time-Interval-Space Forest (TIS Forest) to enhance the efficiency of MGCBS by maintaining the shortest paths from any start point at any start time step to each safe interval at the goal points. The experiment demonstrates that our method can consistently obtain optimal results and execute up to 7 times faster than the state-of-the-art method in our evaluation.
[58] arXiv:2404.19541 (cross-list from cs.CV) [pdf, other]: Title: Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

Authors: Rayan Armani, Changlin Qian, Jiaxi Jiang, Christian Holz

Comments: Accepted by SIGGRAPH 2024, Code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Signal Processing (eess.SP)

While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22\%$ better) and lowering jitter from $1.56$ to $0.055km/s^3$ (a reduction of $97\%$).
[59] arXiv:2404.19543 (cross-list from cs.CL) [pdf, other]: Title: RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing

Authors: Yucheng Hu, Yuxing Lu

Comments: 30 pages, 7 figures. Draft version 1

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have catalyzed significant advancements in Natural Language Processing (NLP), yet they encounter challenges such as hallucination and the need for domain-specific knowledge. To mitigate these, recent methodologies have integrated information retrieved from external resources with LLMs, substantially enhancing their performance across NLP tasks. This survey paper addresses the absence of a comprehensive overview on Retrieval-Augmented Language Models (RALMs), both Retrieval-Augmented Generation (RAG) and Retrieval-Augmented Understanding (RAU), providing an in-depth examination of their paradigm, evolution, taxonomy, and applications. The paper discusses the essential components of RALMs, including Retrievers, Language Models, and Augmentations, and how their interactions lead to diverse model structures and applications. RALMs demonstrate utility in a spectrum of tasks, from translation and dialogue systems to knowledge-intensive applications. The survey includes several evaluation methods of RALMs, emphasizing the importance of robustness, accuracy, and relevance in their assessment. It also acknowledges the limitations of RALMs, particularly in retrieval quality and computational efficiency, offering directions for future research. In conclusion, this survey aims to offer a structured insight into RALMs, their potential, and the avenues for their future development in NLP. The paper is supplemented with a Github Repository containing the surveyed works and resources for further study: https://github.com/2471023025/RALM_Survey.
[60] arXiv:2404.19573 (cross-list from cs.CY) [pdf, other]: Title: War Elephants: Rethinking Combat AI and Human Oversight

Authors: Philip Feldman, Aaron Dant, Harry Dreany

Comments: 15 pages, 2 figures

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

This paper explores the changes that pervasive AI is having on the nature of combat. We look beyond the substitution of AI for experts to an approach where complementary human and machine abilities are blended. Using historical and modern examples, we show how autonomous weapons systems can be effectively managed by teams of human "AI Operators" combined with AI/ML "Proxy Operators." By basing our approach on the principles of complementation, we provide for a flexible and dynamic approach to managing lethal autonomous systems. We conclude by presenting a path to achieving an integrated vision of machine-speed combat where the battlefield AI is operated by AI Operators that watch for patterns of behavior within battlefield to assess the performance of lethal autonomous systems. This approach enables the development of combat systems that are likely to be more ethical, operate at machine speed, and are capable of responding to a broader range of dynamic battlefield conditions than any purely autonomous AI system could support.
[61] arXiv:2404.19640 (cross-list from cs.LG) [pdf, other]: Title: Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

Authors: Yunzhen Feng, Tim G. J. Rudner, Nikolaos Tsilivis, Julia Kempe

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Methodology (stat.ME); Machine Learning (stat.ML)

Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that BNNs trained with state-of-the-art approximate inference methods, and even BNNs trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. We also identify various conceptual and experimental errors in previous works that claimed inherent adversarial robustness of BNNs and conclusively demonstrate that BNNs and uncertainty-aware Bayesian prediction pipelines are not inherently robust against adversarial attacks.
[62] arXiv:2404.19651 (cross-list from cs.LG) [pdf, other]: Title: Provably Robust Conformal Prediction with Improved Efficiency

Authors: Ge Yan, Yaniv Romano, Tsui-Wei Weng

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Conformal prediction is a powerful tool to generate uncertainty sets with guaranteed coverage using any predictive model, under the assumption that the training and test data are i.i.d.. Recently, it has been shown that adversarial examples are able to manipulate conformal methods to construct prediction sets with invalid coverage rates, as the i.i.d. assumption is violated. To address this issue, a recent work, Randomized Smoothed Conformal Prediction (RSCP), was first proposed to certify the robustness of conformal prediction methods to adversarial noise. However, RSCP has two major limitations: (i) its robustness guarantee is flawed when used in practice and (ii) it tends to produce large uncertainty sets. To address these limitations, we first propose a novel framework called RSCP+ to provide provable robustness guarantee in evaluation, which fixes the issues in the original RSCP method. Next, we propose two novel methods, Post-Training Transformation (PTT) and Robust Conformal Training (RCT), to effectively reduce prediction set size with little computation overhead. Experimental results in CIFAR10, CIFAR100, and ImageNet suggest the baseline method only yields trivial predictions including full label set, while our methods could boost the efficiency by up to $4.36\times$, $5.46\times$, and $16.9\times$ respectively and provide practical robustness guarantee. Our codes are available at https://github.com/Trustworthy-ML-Lab/Provably-Robust-Conformal-Prediction.
[63] arXiv:2404.19652 (cross-list from cs.CV) [pdf, other]: Title: VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

Authors: Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang Bai

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters. The Prompt Queries Generation Module facilitates explicit interaction between different tasks, while the Tasks-aware Adapter helps the model dynamically learn suitable features for each task. Additionally, to further enable the model to learn temporal information at a lower cost, we propose a synthetic video text dataset (VTD-368k) by leveraging the Content Deformation Fields (CoDeF) algorithm. Notably, our method outperforms the state-of-the-art method by an average of 2.6% in six cross-domain benchmarks such as TT-to-IC15, CTW1500-to-TT, and TT-to-CTW1500. For video-level cross-domain adaption, our method even surpasses the previous end-to-end video spotting method in ICDAR2015 video and DSText v2 by an average of 5.5% on the MOTA metric, using only image-level data. We further demonstrate that existing Large Multimodal Models exhibit limitations in generating cross-domain scene text spotting, in contrast to our VimTS model which requires significantly fewer parameters and data. The code and datasets will be made available at the https://VimTextSpotter.github.io.
[64] arXiv:2404.19665 (cross-list from physics.med-ph) [pdf, other]: Title: ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging

Authors: Dimitrios Karkalousos, Ivana Išgum, Henk A. Marquering, Matthan W.A. Caan

Subjects: Medical Physics (physics.med-ph); Artificial Intelligence (cs.AI); Mathematical Software (cs.MS); Software Engineering (cs.SE); Mathematical Physics (math-ph)

AI is revolutionizing MRI along the acquisition and processing chain. Advanced AI frameworks have been developed to apply AI in various successive tasks, such as image reconstruction, quantitative parameter map estimation, and image segmentation. Existing frameworks are often designed to perform tasks independently or are focused on specific models or datasets, limiting generalization. We introduce ATOMMIC, an open-source toolbox that streamlines AI applications for accelerated MRI reconstruction and analysis. ATOMMIC implements several tasks using DL networks and enables MultiTask Learning (MTL) to perform related tasks integrated, targeting generalization in the MRI domain. We first review the current state of AI frameworks for MRI through a comprehensive literature search and by parsing 12,479 GitHub repositories. We benchmark 25 DL models on eight publicly available datasets to present distinct applications of ATOMMIC on accelerated MRI reconstruction, image segmentation, quantitative parameter map estimation, and joint accelerated MRI reconstruction and image segmentation utilizing MTL. Our findings demonstrate that ATOMMIC is the only MTL framework with harmonized complex-valued and real-valued data support. Evaluations on single tasks show that physics-based models, which enforce data consistency by leveraging the physical properties of MRI, outperform other models in reconstructing highly accelerated acquisitions. Physics-based models that produce high reconstruction quality can accurately estimate quantitative parameter maps. When high-performing reconstruction models are combined with robust segmentation networks utilizing MTL, performance is improved in both tasks. ATOMMIC facilitates MRI reconstruction and analysis by standardizing workflows, enhancing data interoperability, integrating unique features like MTL, and effectively benchmarking DL models.
[65] arXiv:2404.19675 (cross-list from cs.CY) [pdf, other]: Title: Deep Learning for Educational Data Science

Authors: Juan D. Pinto, Luc Paquette

Comments: 18 pages. To be published in Trust and Inclusion in AI-Mediated Education: Where Human Learning Meets Learning Machines by Springer International

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

With the ever-growing presence of deep artificial neural networks in every facet of modern life, a growing body of researchers in educational data science -- a field consisting of various interrelated research communities -- have turned their attention to leveraging these powerful algorithms within the domain of education. Use cases range from advanced knowledge tracing models that can leverage open-ended student essays or snippets of code to automatic affect and behavior detectors that can identify when a student is frustrated or aimlessly trying to solve problems unproductively -- and much more. This chapter provides a brief introduction to deep learning, describes some of its advantages and limitations, presents a survey of its many uses in education, and discusses how it may further come to shape the field of educational data science.
[66] arXiv:2404.19696 (cross-list from cs.CV) [pdf, other]: Title: Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

Authors: Chun Feng, Joy Hsu, Weiyu Liu, Jiajun Wu

Comments: CVPR 2024. The first two authors contributed equally

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

3D visual grounding is a challenging task that often requires direct and dense supervision, notably the semantic label for each object in the scene. In this paper, we instead study the naturally supervised setting that learns from only 3D scene and QA pairs, where prior works underperform. We propose the Language-Regularized Concept Learner (LARC), which uses constraints from language as regularization to significantly improve the accuracy of neuro-symbolic concept learners in the naturally supervised setting. Our approach is based on two core insights: the first is that language constraints (e.g., a word's relation to another) can serve as effective regularization for structured representations in neuro-symbolic models; the second is that we can query large language models to distill such constraints from language properties. We show that LARC improves performance of prior works in naturally supervised 3D visual grounding, and demonstrates a wide range of 3D visual reasoning capabilities-from zero-shot composition, to data efficiency and transferability. Our method represents a promising step towards regularizing structured visual reasoning frameworks with language-based priors, for learning in settings without dense supervision.
[67] arXiv:2404.19708 (cross-list from cs.LG) [pdf, other]: Title: Harmonic LLMs are Trustworthy

Authors: Nicholas S. Kersting, Mohammad Rahman, Suchismitha Vedala, Yang Wang

Comments: 15 pages, 4 figures, 14 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

We introduce an intuitive method to test the robustness (stability and explainability) of any black-box LLM in real-time, based upon the local deviation from harmoniticity, denoted as $\gamma$. To the best of our knowledge this is the first completely model-agnostic and unsupervised method of measuring the robustness of any given response from an LLM, based upon the model itself conforming to a purely mathematical standard. We conduct human annotation experiments to show the positive correlation of $\gamma$ with false or misleading answers, and demonstrate that following the gradient of $\gamma$ in stochastic gradient ascent efficiently exposes adversarial prompts. Measuring $\gamma$ across thousands of queries in popular LLMs (GPT-4, ChatGPT, Claude-2.1, Mixtral-8x7B, Smaug-72B, Llama2-7B, and MPT-7B) allows us to estimate the liklihood of wrong or hallucinatory answers automatically and quantitatively rank the reliability of these models in various objective domains (Web QA, TruthfulQA, and Programming QA). Across all models and domains tested, human ratings confirm that $\gamma \to 0$ indicates trustworthiness, and the low-$\gamma$ leaders among these models are GPT-4, ChatGPT, and Smaug-72B.
[68] arXiv:2404.19725 (cross-list from cs.LG) [pdf, other]: Title: Fairness Without Demographics in Human-Centered Federated Learning

Authors: Roy Shaily, Sharma Harshit, Salekin Asif

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated learning (FL) enables collaborative model training while preserving data privacy, making it suitable for decentralized human-centered AI applications. However, a significant research gap remains in ensuring fairness in these systems. Current fairness strategies in FL require knowledge of bias-creating/sensitive attributes, clashing with FL's privacy principles. Moreover, in human-centered datasets, sensitive attributes may remain latent. To tackle these challenges, we present a novel bias mitigation approach inspired by "Fairness without Demographics" in machine learning. The presented approach achieves fairness without needing knowledge of sensitive attributes by minimizing the top eigenvalue of the Hessian matrix during training, ensuring equitable loss landscapes across FL participants. Notably, we introduce a novel FL aggregation scheme that promotes participating models based on error rates and loss landscape curvature attributes, fostering fairness across the FL system. This work represents the first approach to attaining "Fairness without Demographics" in human-centered FL. Through comprehensive evaluation, our approach demonstrates effectiveness in balancing fairness and efficacy across various real-world applications, FL setups, and scenarios involving single and multiple bias-inducing factors, representing a significant advancement in human-centered FL.
[69] arXiv:2404.19729 (cross-list from cs.HC) [pdf, ps, other]: Title: A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications

Authors: Steph Buongiorno, Corey Clark

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

External knowledge graphs (KGs) can be used to augment large language models (LLMs), while simultaneously providing an explainable knowledge base of facts that can be inspected by a human. This approach may be particularly valuable in domains where explainability is critical, like human trafficking data analysis. However, creating KGs can pose challenges. KGs parsed from documents may comprise explicit connections (those directly stated by a document) but miss implicit connections (those obvious to a human although not directly stated). To address these challenges, this preliminary research introduces the GAME-KG framework, standing for "Gaming for Augmenting Metadata and Enhancing Knowledge Graphs." GAME-KG is a federated approach to modifying explicit as well as implicit connections in KGs by using crowdsourced feedback collected through video games. GAME-KG is shown through two demonstrations: a Unity test scenario from Dark Shadows, a video game that collects feedback on KGs parsed from US Department of Justice (DOJ) Press Releases on human trafficking, and a following experiment where OpenAI's GPT-4 is prompted to answer questions based on a modified and unmodified KG. Initial results suggest that GAME-KG can be an effective framework for enhancing KGs, while simultaneously providing an explainable set of structured facts verified by humans.
[70] arXiv:2404.19733 (cross-list from cs.CL) [pdf, other]: Title: Iterative Reasoning Preference Optimization

Authors: Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. We train using a modified DPO loss (Rafailov et al., 2023) with an additional negative log-likelihood term, which we find to be crucial. We show reasoning improves across repeated iterations of this scheme. While only relying on examples in the training set, our approach results in increasing accuracy for Llama-2-70B-Chat from 55.6% to 81.6% on GSM8K (and 88.7% with majority voting out of 32 samples), from 12.5% to 20.8% on MATH, and from 77.8% to 86.7% on ARC-Challenge, which outperforms other Llama-2-based models not relying on additionally sourced datasets.
[71] arXiv:2404.19744 (cross-list from cs.CR) [pdf, other]: Title: PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification

Authors: Leon Garza, Lavanya Elluri, Anantaa Kotal, Aritran Piplai, Deepti Gupta, Anupam Joshi

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Data protection and privacy is becoming increasingly crucial in the digital era. Numerous companies depend on third-party vendors and service providers to carry out critical functions within their operations, encompassing tasks such as data handling and storage. However, this reliance introduces potential vulnerabilities, as these vendors' security measures and practices may not always align with the standards expected by regulatory bodies. Businesses are required, often under the penalty of law, to ensure compliance with the evolving regulatory rules. Interpreting and implementing these regulations pose challenges due to their complexity. Regulatory documents are extensive, demanding significant effort for interpretation, while vendor-drafted privacy policies often lack the detail required for full legal compliance, leading to ambiguity. To ensure a concise interpretation of the regulatory requirements and compliance of organizational privacy policy with said regulations, we propose a Large Language Model (LLM) and Semantic Web based approach for privacy compliance. In this paper, we develop the novel Privacy Policy Compliance Verification Knowledge Graph, PrivComp-KG. It is designed to efficiently store and retrieve comprehensive information concerning privacy policies, regulatory frameworks, and domain-specific knowledge pertaining to the legal landscape of privacy. Using Retrieval Augmented Generation, we identify the relevant sections in a privacy policy with corresponding regulatory rules. This information about individual privacy policies is populated into the PrivComp-KG. Combining this with the domain context and rules, the PrivComp-KG can be queried to check for compliance with privacy policies by each vendor against relevant policy regulations. We demonstrate the relevance of the PrivComp-KG, by verifying compliance of privacy policy documents for various organizations.
[72] arXiv:2404.19748 (cross-list from cs.CV) [pdf, other]: Title: Quantifying Nematodes through Images: Datasets, Models, and Baselines of Deep Learning

Authors: Zhipeng Yuan, Nasamu Musa, Katarzyna Dybal, Matthew Back, Daniel Leybourne, Po Yang

Comments: The 26th IEEE International Conference on Computational Science and Engineering (CSE-2023)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Every year, plant parasitic nematodes, one of the major groups of plant pathogens, cause a significant loss of crops worldwide. To mitigate crop yield losses caused by nematodes, an efficient nematode monitoring method is essential for plant and crop disease management. In other respects, efficient nematode detection contributes to medical research and drug discovery, as nematodes are model organisms. With the rapid development of computer technology, computer vision techniques provide a feasible solution for quantifying nematodes or nematode infections. In this paper, we survey and categorise the studies and available datasets on nematode detection through deep-learning models. To stimulate progress in related research, this survey presents the potential state-of-the-art object detection models, training techniques, optimisation techniques, and evaluation metrics for deep learning beginners. Moreover, seven state-of-the-art object detection models are validated on three public datasets and the AgriNema dataset for plant parasitic nematodes to construct a baseline for nematode detection.
[73] arXiv:2404.19753 (cross-list from cs.CV) [pdf, other]: Title: DOCCI: Descriptions of Connected and Contrasting Images

Authors: Yasumasa Onoe, Sunayana Rane, Zachary Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason Baldridge

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T) research. However, current datasets lack descriptions with fine-grained detail that would allow for richer associations to be learned by models. To fill the gap, we introduce Descriptions of Connected and Contrasting Images (DOCCI), a dataset with long, human-annotated English descriptions for 15k images that were taken, curated and donated by a single researcher intent on capturing key challenges such as spatial relations, counting, text rendering, world knowledge, and more. We instruct human annotators to create comprehensive descriptions for each image; these average 136 words in length and are crafted to clearly distinguish each image from those that are related or similar. Each description is highly compositional and typically encompasses multiple challenges. Through both quantitative and qualitative analyses, we demonstrate that DOCCI serves as an effective training resource for image-to-text generation -- a PaLI 5B model finetuned on DOCCI shows equal or superior results compared to highly-performant larger models like LLaVA-1.5 7B and InstructBLIP 7B. Furthermore, we show that DOCCI is a useful testbed for text-to-image generation, highlighting the limitations of current text-to-image models in capturing long descriptions and fine details.
[74] arXiv:2404.19756 (cross-list from cs.LG) [pdf, other]: Title: KAN: Kolmogorov-Arnold Networks

Authors: Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, Max Tegmark

Comments: 48 pages, 20 figures. Codes are available at this https URL

Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.

Replacements for Wed, 1 May 24

[75] arXiv:2207.10170 (replaced) [pdf, other]: Title: Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers

Authors: Tim Franzmeyer, Stephen McAleer, João F. Henriques, Jakob N. Foerster, Philip H.S. Torr, Adel Bibi, Christian Schroeder de Witt

Subjects: Artificial Intelligence (cs.AI)
[76] arXiv:2306.08765 (replaced) [pdf, other]: Title: Causal Discovery from Time Series with Hybrids of Constraint-Based and Noise-Based Algorithms

Authors: Daria Bystrova, Charles K. Assaad, Julyan Arbel, Emilie Devijver, Eric Gaussier, Wilfried Thuiller

Comments: Accepted in TMLR: this https URL

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[77] arXiv:2309.17288 (replaced) [pdf, other]: Title: AutoAgents: A Framework for Automatic Agent Generation

Authors: Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje F. Karlsson, Jie Fu, Yemin Shi

Comments: IJCAI 2024

Subjects: Artificial Intelligence (cs.AI)
[78] arXiv:2312.13912 (replaced) [pdf, other]: Title: Solving Long-run Average Reward Robust MDPs via Stochastic Games

Authors: Krishnendu Chatterjee, Ehsan Kafshdar Goharshady, Mehrdad Karrabi, Petr Novotný, Đorđe Žikelić

Subjects: Artificial Intelligence (cs.AI)
[79] arXiv:2401.17159 (replaced) [pdf, other]: Title: Layered and Staged Monte Carlo Tree Search for SMT Strategy Synthesis

Authors: Zhengyang Lu, Stefan Siemer, Piyush Jha, Joel Day, Florin Manea, Vijay Ganesh

Comments: Accepted at IJCAI 2024

Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
[80] arXiv:2402.06665 (replaced) [pdf, other]: Title: The Essential Role of Causality in Foundation World Models for Embodied AI

Authors: Tarun Gupta, Wenbo Gong, Chao Ma, Nick Pawlowski, Agrin Hilmkil, Meyer Scetbon, Marc Rigter, Ade Famoti, Ashley Juan Llorens, Jianfeng Gao, Stefan Bauer, Danica Kragic, Bernhard Schölkopf, Cheng Zhang

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)
[81] arXiv:2403.05000 (replaced) [pdf, other]: Title: Medical Speech Symptoms Classification via Disentangled Representation

Authors: Jianzong Wang, Pengcheng Li, Xulong Zhang, Ning Cheng, Jing Xiao

Comments: Accepted by the 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2024)

Subjects: Artificial Intelligence (cs.AI)
[82] arXiv:2404.18533 (replaced) [pdf, other]: Title: Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability

Authors: Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang

Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[83] arXiv:1203.0550 (replaced) [pdf, other]: Title: Algorithms for Learning Kernels Based on Centered Alignment

Authors: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

Journal-ref: Journal of Machine Learning Research 13 (2012) 795-828

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[84] arXiv:2102.04394 (replaced) [pdf, other]: Title: Learning with Density Matrices and Random Features

Authors: Fabio A. González, Alejandro Gallego, Santiago Toledo-Cortés, Vladimir Vargas-Calderón

Comments: Final version published in Quantum Mach. Intell. 4, 23 (2022)

Journal-ref: Quantum Mach. Intell. 4, 23 (2022)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
[85] arXiv:2105.13857 (replaced) [pdf, other]: Title: Learning Approximate and Exact Numeral Systems via Reinforcement Learning

Authors: Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson

Comments: CogSci 2021. Fixed typos

Journal-ref: Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 43 (2021)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[86] arXiv:2209.05580 (replaced) [pdf, other]: Title: Risk-aware Meta-level Decision Making for Exploration Under Uncertainty

Authors: Joshua Ott, Sung-Kyun Kim, Amanda Bouman, Oriana Peltzer, Mamoru Sobue, Harrison Delecki, Mykel J. Kochenderfer, Joel Burdick, Ali-akbar Agha-mohammadi

Comments: IEEE International Conference on Control, Decision and Information Technologies

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[87] arXiv:2212.06278 (replaced) [pdf, other]: Title: Efficient Bayesian Uncertainty Estimation for nnU-Net

Authors: Yidong Zhao, Changchun Yang, Artur Schweidtmann, Qian Tao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[88] arXiv:2304.08235 (replaced) [pdf, other]: Title: A Platform-Agnostic Deep Reinforcement Learning Framework for Effective Sim2Real Transfer in Autonomous Driving

Authors: Dianzhao Li, Ostap Okhrin

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[89] arXiv:2305.01658 (replaced) [pdf, other]: Title: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework with Gray Code Representation

Authors: Dongyue Guo, Zheng Zhang, Zhen Yan, Jianwei Zhang, Yi Lin

Comments: An extend version based on the AAAI version

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[90] arXiv:2305.06221 (replaced) [pdf, other]: Title: Multi-Prompt with Depth Partitioned Cross-Modal Learning

Authors: Yingjie Tian, Yiqi Wang, Xianda Guo, Zheng Zhu, Long Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[91] arXiv:2305.12106 (replaced) [pdf, ps, other]: Title: Human-annotated label noise and their impact on ConvNets for remote sensing image scene classification

Authors: Longkang Peng, Tao Wei, Xuehong Chen, Xiaobei Chen, Rui Sun, Luoma Wan, Jin Chen, Xiaolin Zhu

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[92] arXiv:2308.14250 (replaced) [pdf, other]: Title: Rule-Based Error Detection and Correction to Operationalize Movement Trajectory Classification

Authors: Bowen Xi, Kevin Scaria, Paulo Shakarian

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
[93] arXiv:2310.03718 (replaced) [pdf, other]: Title: Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning

Authors: Yihang Yao, Zuxin Liu, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, Ding Zhao

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[94] arXiv:2310.05058 (replaced) [pdf, other]: Title: Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading

Authors: Songtao Luo, Shuang Yang, Shiguang Shan, Xilin Chen

Comments: Accepted to BMVC 2023 20pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2310.13639 (replaced) [pdf, other]: Title: Contrastive Preference Learning: Learning from Human Feedback without RL

Authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

Comments: ICLR 2024. Code released at this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[96] arXiv:2310.16452 (replaced) [pdf, other]: Title: Faithful Path Language Modeling for Explainable Recommendation over Knowledge Graph

Authors: Giacomo Balloccu, Ludovico Boratto, Christian Cancedda, Gianni Fenu, Mirko Marras

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[97] arXiv:2311.07978 (replaced) [pdf, other]: Title: How good are Large Language Models on African Languages?

Authors: Jessica Ojo, Kelechi Ogueji, Pontus Stenetorp, David Ifeoluwa Adelani

Comments: Under review

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[98] arXiv:2311.09175 (replaced) [pdf, other]: Title: Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?

Authors: Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, Michael Bendersky

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
[99] arXiv:2311.12410 (replaced) [pdf, other]: Title: nach0: Multimodal Natural and Chemical Languages Foundation Model

Authors: Micha Livne, Zulfat Miftahutdinov, Elena Tutubalina, Maksim Kuznetsov, Daniil Polykovskiy, Annika Brundyn, Aastha Jhunjhunwala, Anthony Costa, Alex Aliper, Alán Aspuru-Guzik, Alex Zhavoronkov

Comments: Accepted to Chemical Science Journal

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[100] arXiv:2312.16043 (replaced) [pdf, other]: Title: An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification

Authors: Hyenkyun Woo

Comments: 26 pages, 9 figures, revised version

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[101] arXiv:2401.11325 (replaced) [pdf, other]: Title: Detecting Hidden Triggers: Mapping Non-Markov Reward Functions to Markov

Authors: Gregory Hyde, Eugene Santos Jr

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[102] arXiv:2401.11647 (replaced) [pdf, other]: Title: LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised Learning

Authors: Ye Lin Tun, Chu Myaet Thwal, Le Quang Huy, Minh N. H. Nguyen, Choong Seon Hong

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[103] arXiv:2401.13555 (replaced) [pdf, other]: Title: Benchmarking the Fairness of Image Upsampling Methods

Authors: Mike Laszkiewicz, Imant Daunhawer, Julia E. Vogt, Asja Fischer, Johannes Lederer

Comments: This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published at the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[104] arXiv:2401.14831 (replaced) [pdf, other]: Title: The Machine Vision Iceberg Explained: Advancing Dynamic Testing by Considering Holistic Environmental Relations

Authors: Hubert Padusinski, Christian Steinhauser, Thilo Braun, Lennart Ries, Eric Sax

Comments: Submitted at IEEE ITSC 2024

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE); Image and Video Processing (eess.IV)
[105] arXiv:2402.03305 (replaced) [pdf, other]: Title: Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?

Authors: Qiyao Liang, Ziming Liu, Ila Fiete

Comments: 13 pages, 9 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[106] arXiv:2402.06046 (replaced) [pdf, ps, other]: Title: Anatomy of a Robotaxi Crash: Lessons from the Cruise Pedestrian Dragging Mishap

Authors: Philip Koopman

Comments: 15 pages, 2 figures

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[107] arXiv:2402.10142 (replaced) [pdf, other]: Title: Tracking Changing Probabilities via Dynamic Learners

Authors: Omid Madani

Comments: 63 pages, 24 figures, 17 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[108] arXiv:2402.12147 (replaced) [pdf, other]: Title: Surprising Efficacy of Fine-Tuned Transformers for Fact-Checking over Larger Language Models

Authors: Vinay Setty

Comments: Accepted in SIGIR 2024 (industry track)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[109] arXiv:2402.12365 (replaced) [pdf, other]: Title: Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators

Authors: Benedikt Alkin, Andreas Fürst, Simon Schmid, Lukas Gruber, Markus Holzleitner, Johannes Brandstetter

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Fluid Dynamics (physics.flu-dyn)
[110] arXiv:2402.14846 (replaced) [pdf, other]: Title: Stick to Your Role! Context-dependence and Stability of Personal Value Expression in Large Language Models

Authors: Grgur Kovač, Rémy Portelas, Masataka Sawayama, Peter Ford Dominey, Pierre-Yves Oudeyer

Comments: The project website and code are available at this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[111] arXiv:2402.17420 (replaced) [pdf, other]: Title: PANDAS: Prototype-based Novel Class Discovery and Detection

Authors: Tyler L. Hayes, César R. de Souza, Namil Kim, Jiwon Kim, Riccardo Volpi, Diane Larlus

Comments: Accepted to the Conference on Lifelong Learning Agents (CoLLAs 2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[112] arXiv:2402.17516 (replaced) [pdf, other]: Title: QUCE: The Minimisation and Quantification of Path-Based Uncertainty for Generative Counterfactual Explanations

Authors: Jamie Duell, Monika Seisenberger, Hsuan Fu, Xiuyi Fan

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[113] arXiv:2402.18002 (replaced) [pdf, other]: Title: Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist

Authors: Hai Nguyen, Tadashi Kozuno, Cristian C. Beltran-Hernandez, Masashi Hamaya

Comments: Accepted at ICRA-2024

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[114] arXiv:2402.18012 (replaced) [pdf, other]: Title: Diffusion Models as Constrained Samplers for Optimization with Unknown Constraints

Authors: Lingkai Kong, Yuanqi Du, Wenhao Mu, Kirill Neklyudov, Valentin De Bortoli, Haorui Wang, Dongxia Wu, Aaron Ferber, Yi-An Ma, Carla P. Gomes, Chao Zhang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[115] arXiv:2403.03239 (replaced) [pdf, ps, other]: Title: Note: Harnessing Tellurium Nanoparticles in the Digital Realm Plasmon Resonance, in the Context of Brewster's Angle and the Drude Model for Fake News Adsorption in Incomplete Information Games

Authors: Yasuko Kawahata

Comments: Tellurium Nanoparticles, Snell's Law, Soliton Solution, Anamorphic Surfaces, Nonlinear Dynamics, Fake News Adsorption, User Behavior Modeling, Health Improvement Strategies, Plasmonic Sensors This paper is partially an attempt to utilize "Generative AI" and was written with educational intent. There are currently no plans for it to become a peer-reviewed paper

Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)
[116] arXiv:2403.05585 (replaced) [pdf, ps, other]: Title: Plasmon Resonance Model: Investigation of Analysis of Fake News Diffusion Model with Third Mover Intervention Using Soliton Solution in Non-Complete Information Game under Repeated Dilemma Condition

Authors: Yasuko Kawahata

Comments: Plasmon Resonance Model, Soliton Solution, Third Mover,Fake News, Non-Complete Information Game, Nonlinear Partial Differential Equations, First Mover, Second Mover, Third Mover, Diffusion Dynamics, Iteration Dilemma

Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)
[117] arXiv:2403.05593 (replaced) [pdf, ps, other]: Title: Introducing First-Principles Calculations: New Approach to Group Dynamics and Bridging Social Phenomena in TeNP-Chain Based Social Dynamics Simulations

Authors: Yasuko Kawahata

Comments: TeNP Chains, First-principles calculations, Tellurium nanoparticles (TeNPs), Graphene, Fake news dissemination, Social cohesion, Information Flow Disruption, Quantum Mechanics, Interdisciplinary approach, Misinformation mitigation

Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Physics Education (physics.ed-ph)
[118] arXiv:2403.08828 (replaced) [pdf, other]: Title: People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior

Authors: Balint Gyevnar, Stephanie Droop, Tadeg Quillien, Shay B. Cohen, Neil R. Bramley, Christopher G. Lucas, Stefano V. Albrecht

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[119] arXiv:2403.10853 (replaced) [pdf, other]: Title: Just Say the Name: Online Continual Learning with Category Names Only via Data Generation

Authors: Minhyuk Seo, Diganta Misra, Seongwon Cho, Minjae Lee, Jonghyun Choi

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[120] arXiv:2403.11585 (replaced) [pdf, other]: Title: Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines

Authors: Ekaterina Trofimova, Emil Sataev, Andrey E. Ustyuzhanin

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL); Software Engineering (cs.SE)
[121] arXiv:2403.17169 (replaced) [pdf, other]: Title: QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims

Authors: Venktesh V, Abhijit Anand, Avishek Anand, Vinay Setty

Comments: 11 pages, 1 figure,Accepted for publication at the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[122] arXiv:2404.01413 (replaced) [pdf, other]: Title: Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Emerging Technologies (cs.ET); Machine Learning (stat.ML)
[123] arXiv:2404.03147 (replaced) [pdf, other]: Title: Eigenpruning

Authors: Tomás Vergara-Browne, Álvaro Soto, Akiko Aizawa

Comments: Extended abstract accepted to LatinX at NAACL 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[124] arXiv:2404.04814 (replaced) [pdf, other]: Title: Inference-Time Rule Eraser: Distilling and Removing Bias Rules to Mitigate Bias in Deployed Models

Authors: Yi Zhang, Jitao Sang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[125] arXiv:2404.06362 (replaced) [pdf, other]: Title: Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation

Authors: Sidra Aleem, Fangyijie Wang, Mayug Maniparambil, Eric Arazo, Julia Dietlmeier, Guenole Silvestre, Kathleen Curran, Noel E. O'Connor, Suzanne Little

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[126] arXiv:2404.07017 (replaced) [pdf, other]: Title: Improving Language Model Reasoning with Self-motivated Learning

Authors: Yunlong Feng, Yang Xu, Libo Qin, Yasheng Wang, Wanxiang Che

Comments: Accepted at LREC-COLING 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[127] arXiv:2404.07544 (replaced) [pdf, other]: Title: From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Authors: Robert Vacareanu, Vlad-Andrei Negru, Vasile Suciu, Mihai Surdeanu

Comments: 50 pages, 48 figures, preprint; Fixed typos

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[128] arXiv:2404.08995 (replaced) [pdf, other]: Title: Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

Authors: Ye Wang, Yaxiong Wang, Yujiao Wu, Bingchen Zhao, Xueming Qian

Comments: 9 pages, 7 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[129] arXiv:2404.09005 (replaced) [pdf, other]: Title: Proof-of-Learning with Incentive Security

Authors: Zishuo Zhao, Zhixuan Fang, Xuechao Wang, Xi Chen, Yuan Zhou

Comments: 17 pages, 5 figures

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
[130] arXiv:2404.13070 (replaced) [pdf, other]: Title: Evidence from counterfactual tasks supports emergent analogical reasoning in large language models

Authors: Taylor Webb, Keith J. Holyoak, Hongjing Lu

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[131] arXiv:2404.13680 (replaced) [pdf, other]: Title: PoseAnimate: Zero-shot high fidelity pose controllable character animation

Authors: Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Yu-Gang Jiang, Guo-Jun Qi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[132] arXiv:2404.14606 (replaced) [pdf, ps, other]: Title: Cross-Task Multi-Branch Vision Transformer for Facial Expression and Mask Wearing Classification

Authors: Armando Zhu, Keqin Li, Tong Wu, Peng Zhao, Bo Hong

Journal-ref: Journal of Computer Technology and Applied Mathematics, vol. 1, no. 1, Apr. 2024, pp. 46-53,

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[133] arXiv:2404.15369 (replaced) [pdf, other]: Title: Can a Machine be Conscious? Towards Universal Criteria for Machine Consciousness

Authors: Nur Aizaan Anwar, Cosmin Badea

Comments: This work was supported by the UKRI CDT in AI for Healthcare, this http URL (Grant No. EP/S023283/1)

Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[134] arXiv:2404.16014 (replaced) [pdf, other]: Title: Improving Dictionary Learning with Gated Sparse Autoencoders

Authors: Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda

Comments: 15 main text pages, 22 appendix pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[135] arXiv:2404.16894 (replaced) [pdf, other]: Title: On TinyML and Cybersecurity: Electric Vehicle Charging Infrastructure Use Case

Authors: Fatemeh Dehrouyeh, Li Yang, Firouz Badrkhani Ajaei, Abdallah Shami

Comments: Submitted to IEEE Access; Code is available at GitHub link: this https URL

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[136] arXiv:2404.17454 (replaced) [pdf, other]: Title: Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond

Authors: Kaichen Xu, Yueyang Ding, Suyang Hou, Weiqiang Zhan, Nisang Chen, Jun Wang, Xiaobo Sun

Comments: 17 pages, 2 figures. Accepted by IJCAI 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
[137] arXiv:2404.17489 (replaced) [pdf, other]: Title: Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation

Authors: Wei Cui, Rasa Hosseinzadeh, Junwei Ma, Tongzi Wu, Yi Sui, Keyvan Golestan

Comments: 14 pages, 4 algorithms, 3 figures, 5 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[138] arXiv:2404.18081 (replaced) [pdf, other]: Title: ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[139] arXiv:2404.18255 (replaced) [pdf, other]: Title: PatentGPT: A Large Language Model for Intellectual Property

Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang, Weilei Wang, Changyang Tu

Comments: 19 pages, 9 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[140] arXiv:2404.18311 (replaced) [pdf, ps, other]: Title: Towards Real-time Learning in Large Language Models: A Critical Review

Authors: Mladjan Jovanovic, Peter Voss

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[141] arXiv:2404.18385 (replaced) [pdf, other]: Title: Equivalence: An analysis of artists' roles with Image Generative AI from Conceptual Art perspective through an interactive installation design practice

Authors: Yixuan Li, Dan C. Baciu, Marcos Novak, George Legrady

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
[142] arXiv:2404.18736 (replaced) [pdf, other]: Title: Mapping the Potential of Explainable Artificial Intelligence (XAI) for Fairness Along the AI Lifecycle

Authors: Luca Deck, Astrid Schomäcker, Timo Speith, Jakob Schöffer, Lena Kästner, Niklas Kühl

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[143] arXiv:2404.18821 (replaced) [pdf, other]: Title: Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies

Authors: Seyed Soroush Karimi Madahi, Gargya Gokhale, Marie-Sophie Verwee, Bert Claessens, Chris Develder

Comments: ACM e-Energy 2024

Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[144] arXiv:2404.18848 (replaced) [pdf, other]: Title: FeDeRA:Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition

Authors: Yuxuan Yan, Shunpu Tang, Zhiguo Shi, Qianqian Yang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

New submissions
Cross-lists
Replacements

[ total of 144 entries: 1-144 ]
[ showing up to 250 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2404, contact, help (Access key information)

> cs > cs.AI

Artificial Intelligence

New submissions

New submissions for Wed, 1 May 24

Cross-lists for Wed, 1 May 24

Replacements for Wed, 1 May 24