Computation and Language

New submissions

Submissions received from Mon 13 May 24 to Tue 14 May 24, announced Wed, 15 May 24

New submissions
Cross-lists
Replacements

[ total of 73 entries: 1-73 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 15 May 24

[1] arXiv:2405.08099 [pdf, other]: Title: KET-QA: A Dataset for Knowledge Enhanced Table Question Answering

Authors: Mengkang Hu, Haoyu Dong, Ping Luo, Shi Han, Dongmei Zhang

Comments: LREC-Coling 2024

Subjects: Computation and Language (cs.CL)

Due to the concise and structured nature of tables, the knowledge contained therein may be incomplete or missing, posing a significant challenge for table question answering (TableQA) and data analysis systems. Most existing datasets either fail to address the issue of external knowledge in TableQA or only utilize unstructured text as supplementary information for tables. In this paper, we propose to use a knowledge base (KB) as the external knowledge source for TableQA and construct a dataset KET-QA with fine-grained gold evidence annotation. Each table in the dataset corresponds to a sub-graph of the entire KB, and every question requires the integration of information from both the table and the sub-graph to be answered. To extract pertinent information from the vast knowledge sub-graph and apply it to TableQA, we design a retriever-reasoner structured pipeline model. Experimental results demonstrate that our model consistently achieves remarkable relative performance improvements ranging from 1.9 to 6.5 times and absolute improvements of 11.66% to 44.64% on EM scores across three distinct settings (fine-tuning, zero-shot, and few-shot), in comparison with solely relying on table information in the traditional TableQA manner. However, even the best model achieves a 60.23% EM score, which still lags behind the human-level performance, highlighting the challenging nature of KET-QA for the question-answering community. We also provide a human evaluation of error cases to analyze further the aspects in which the model can be improved. Project page: https://ketqa.github.io/.
[2] arXiv:2405.08134 [pdf, other]: Title: Many-Shot Regurgitation (MSR) Prompting

Authors: Shashank Sonkar, Richard G. Baraniuk

Subjects: Computation and Language (cs.CL)

We introduce Many-Shot Regurgitation (MSR) prompting, a new black-box membership inference attack framework for examining verbatim content reproduction in large language models (LLMs). MSR prompting involves dividing the input text into multiple segments and creating a single prompt that includes a series of faux conversation rounds between a user and a language model to elicit verbatim regurgitation. We apply MSR prompting to diverse text sources, including Wikipedia articles and open educational resources (OER) textbooks, which provide high-quality, factual content and are continuously updated over time. For each source, we curate two dataset types: one that LLMs were likely exposed to during training ($D_{\rm pre}$) and another consisting of documents published after the models' training cutoff dates ($D_{\rm post}$). To quantify the occurrence of verbatim matches, we employ the Longest Common Substring algorithm and count the frequency of matches at different length thresholds. We then use statistical measures such as Cliff's delta, Kolmogorov-Smirnov (KS) distance, and Kruskal-Wallis H test to determine whether the distribution of verbatim matches differs significantly between $D_{\rm pre}$ and $D_{\rm post}$. Our findings reveal a striking difference in the distribution of verbatim matches between $D_{\rm pre}$ and $D_{\rm post}$, with the frequency of verbatim reproduction being significantly higher when LLMs (e.g. GPT models and LLaMAs) are prompted with text from datasets they were likely trained on. For instance, when using GPT-3.5 on Wikipedia articles, we observe a substantial effect size (Cliff's delta $= -0.984$) and a large KS distance ($0.875$) between the distributions of $D_{\rm pre}$ and $D_{\rm post}$. Our results provide compelling evidence that LLMs are more prone to reproducing verbatim content when the input text is likely sourced from their training data.
[3] arXiv:2405.08142 [pdf, ps, other]: Title: Discursive objection strategies in online comments: Developing a classification schema and validating its training

Authors: Ashley L. Shea, Aspen K.B. Omapang, Ji Yong Cho, Miryam Y. Ginsparg, Natalie Bazarova, Winice Hui, René F. Kizilcec, Chau Tong, Drew Margolin

Comments: This paper was accepted and presented at the 73rd Annual International Communication Association International Conference, May 2023

Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

Most Americans agree that misinformation, hate speech and harassment are harmful and inadequately curbed on social media through current moderation practices. In this paper, we aim to understand the discursive strategies employed by people in response to harmful speech in news comments. We conducted a content analysis of more than 6500 comment replies to trending news videos on YouTube and Twitter and identified seven distinct discursive objection strategies (Study 1). We examined the frequency of each strategy's occurrence from the 6500 comment replies, as well as from a second sample of 2004 replies (Study 2). Together, these studies show that people deploy a diversity of discursive strategies when objecting to speech, and reputational attacks are the most common. The resulting classification scheme accounts for different theoretical approaches for expressing objections and offers a comprehensive perspective on grassroots efforts aimed at stopping offensive or problematic speech on campus.
[4] arXiv:2405.08151 [pdf, other]: Title: Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness

Authors: Mingchen Li, Zaifu Zhan, Han Yang, Yongkang Xiao, Jiatan Huang, Rui Zhang

Subjects: Computation and Language (cs.CL)

Large language models (LLM) have demonstrated remarkable capabilities in various biomedical natural language processing (NLP) tasks, leveraging the demonstration within the input context to adapt to new tasks. However, LLM is sensitive to the selection of demonstrations. To address the hallucination issue inherent in LLM, retrieval-augmented LLM (RAL) offers a solution by retrieving pertinent information from an established database. Nonetheless, existing research work lacks rigorous evaluation of the impact of retrieval-augmented large language models on different biomedical NLP tasks. This deficiency makes it challenging to ascertain the capabilities of RAL within the biomedical domain. Moreover, the outputs from RAL are affected by retrieving the unlabeled, counterfactual, or diverse knowledge that is not well studied in the biomedical domain. However, such knowledge is common in the real world. Finally, exploring the self-awareness ability is also crucial for the RAL system. So, in this paper, we systematically investigate the impact of RALs on 5 different biomedical tasks (triple extraction, link prediction, classification, question answering, and natural language inference). We analyze the performance of RALs in four fundamental abilities, including unlabeled robustness, counterfactual robustness, diverse robustness, and negative awareness. To this end, we proposed an evaluation framework to assess the RALs' performance on different biomedical NLP tasks and establish four different testbeds based on the aforementioned fundamental abilities. Then, we evaluate 3 representative LLMs with 3 different retrievers on 5 tasks over 9 datasets.
[5] arXiv:2405.08172 [pdf, other]: Title: CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation

Authors: Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic

Comments: on-going work, 30 pages

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper investigates the development and evaluation of machine translation models from Cantonese to English, where we propose a novel approach to tackle low-resource language translations. The main objectives of the study are to develop a model that can effectively translate Cantonese to English and evaluate it against state-of-the-art commercial models. To achieve this, a new parallel corpus has been created by combining different available corpora online with preprocessing and cleaning. In addition, a monolingual Cantonese dataset has been created through web scraping to aid the synthetic parallel corpus generation. Following the data collection process, several approaches, including fine-tuning models, back-translation, and model switch, have been used. The translation quality of models has been evaluated with multiple quality metrics, including lexicon-based metrics (SacreBLEU and hLEPOR) and embedding-space metrics (COMET and BERTscore). Based on the automatic metrics, the best model is selected and compared against the 2 best commercial translators using the human evaluation framework HOPES. The best model proposed in this investigation (NLLB-mBART) with model switch mechanisms has reached comparable and even better automatic evaluation scores against State-of-the-art commercial models (Bing and Baidu Translators), with a SacreBLEU score of 16.8 on our test set. Furthermore, an open-source web application has been developed to allow users to translate between Cantonese and English, with the different trained models available for effective comparisons between models from this investigation and users. CANTONMT is available at https://github.com/kenrickkung/CantoneseTranslation
[6] arXiv:2405.08213 [pdf, other]: Title: Interpreting Latent Student Knowledge Representations in Programming Assignments

Authors: Nigel Fernandez, Andrew Lan

Comments: EDM 2024: 17th International Conference on Educational Data Mining

Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)

Recent advances in artificial intelligence for education leverage generative large language models, including using them to predict open-ended student responses rather than their correctness only. However, the black-box nature of these models limits the interpretability of the learned student knowledge representations. In this paper, we conduct a first exploration into interpreting latent student knowledge representations by presenting InfoOIRT, an Information regularized Open-ended Item Response Theory model, which encourages the latent student knowledge states to be interpretable while being able to generate student-written code for open-ended programming questions. InfoOIRT maximizes the mutual information between a fixed subset of latent knowledge states enforced with simple prior distributions and generated student code, which encourages the model to learn disentangled representations of salient syntactic and semantic code features including syntactic styles, mastery of programming skills, and code structures. Through experiments on a real-world programming education dataset, we show that InfoOIRT can both accurately generate student code and lead to interpretable student knowledge representations.
[7] arXiv:2405.08223 [pdf, other]: Title: An information-theoretic model of shallow and deep language comprehension

Authors: Jiaxuan Li, Richard Futrell

Comments: 6 pages; accepted to COGSCI 2024

Subjects: Computation and Language (cs.CL); Information Theory (cs.IT)

A large body of work in psycholinguistics has focused on the idea that online language comprehension can be shallow or `good enough': given constraints on time or available computation, comprehenders may form interpretations of their input that are plausible but inaccurate. However, this idea has not yet been linked with formal theories of computation under resource constraints. Here we use information theory to formulate a model of language comprehension as an optimal trade-off between accuracy and processing depth, formalized as bits of information extracted from the input, which increases with processing time. The model provides a measure of processing effort as the change in processing depth, which we link to EEG signals and reading times. We validate our theory against a large-scale dataset of garden path sentence reading times, and EEG experiments featuring N400, P600 and biphasic ERP effects. By quantifying the timecourse of language processing as it proceeds from shallow to deep, our model provides a unified framework to explain behavioral and neural signatures of language comprehension.
[8] arXiv:2405.08237 [pdf, other]: Title: A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech

Authors: Oli Danyi Liu, Hao Tang, Naomi Feldman, Sharon Goldwater

Comments: Accepted to CogSci 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Speech perception involves storing and integrating sequentially presented items. Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech that may facilitate this temporal processing. In this study, we simulated similar analyses with representations extracted from a computational model that was trained on unlabelled speech with the learning objective of predicting upcoming acoustics. Our simulations revealed temporal dynamics similar to those in brain signals, implying that these properties can arise without linguistic knowledge. Another property shared between brains and the model is that the encoding patterns of phonemes support some degree of cross-context generalization. However, we found evidence that the effectiveness of these generalizations depends on the specific contexts, which suggests that this analysis alone is insufficient to support the presence of context-invariant encoding.
[9] arXiv:2405.08254 [pdf, other]: Title: Detecting Fallacies in Climate Misinformation: A Technocognitive Approach to Identifying Misleading Argumentation

Authors: Francisco Zanartu, John Cook, Markus Wagner, Julian Garcia

Subjects: Computation and Language (cs.CL)

Misinformation about climate change is a complex societal issue requiring holistic, interdisciplinary solutions at the intersection between technology and psychology. One proposed solution is a "technocognitive" approach, involving the synthesis of psychological and computer science research. Psychological research has identified that interventions in response to misinformation require both fact-based (e.g., factual explanations) and technique-based (e.g., explanations of misleading techniques) content. However, little progress has been made on documenting and detecting fallacies in climate misinformation. In this study, we apply a previously developed critical thinking methodology for deconstructing climate misinformation, in order to develop a dataset mapping different types of climate misinformation to reasoning fallacies. This dataset is used to train a model to detect fallacies in climate misinformation. Our study shows F1 scores that are 2.5 to 3.5 better than previous works. The fallacies that are easiest to detect include fake experts and anecdotal arguments, while fallacies that require background knowledge, such as oversimplification, misrepresentation, and slothful induction, are relatively more difficult to detect. This research lays the groundwork for development of solutions where automatically detected climate misinformation can be countered with generative technique-based corrections.
[10] arXiv:2405.08295 [pdf, other]: Title: SpeechVerse: A Large-scale Generalizable Audio Language Model

Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, David Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Comments: Single Column, 13 page

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore develop SpeechVerse, a robust multi-task training and curriculum learning framework that combines pre-trained speech and text foundation models via a small set of learnable parameters, while keeping the pre-trained models frozen during training. The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions. We perform extensive benchmarking that includes comparing our model performance against traditional baselines across several datasets and tasks. Furthermore, we evaluate the model's capability for generalized instruction following by testing on out-of-domain datasets, novel prompts, and unseen tasks. Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
[11] arXiv:2405.08304 [pdf, other]: Title: Computational Thought Experiments for a More Rigorous Philosophy and Science of the Mind

Authors: Iris Over, Nikhil Krishnaswamy, James Pustejovsky, Joshua Hartshorne

Comments: 6 pages, 4 figures, to appear at CogSci 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

We offer philosophical motivations for a method we call Virtual World Cognitive Science (VW CogSci), in which researchers use virtual embodied agents that are embedded in virtual worlds to explore questions in the field of Cognitive Science. We focus on questions about mental and linguistic representation and the ways that such computational modeling can add rigor to philosophical thought experiments, as well as the terminology used in the scientific study of such representations. We find that this method forces researchers to take a god's-eye view when describing dynamical relationships between entities in minds and entities in an environment in a way that eliminates the need for problematic talk of belief and concept types, such as the belief that cats are silly, and the concept CAT, while preserving belief and concept tokens in individual cognizers' minds. We conclude with some further key advantages of VW CogSci for the scientific study of mental and linguistic representation and for Cognitive Science more broadly.
[12] arXiv:2405.08311 [pdf, ps, other]: Title: A Decoupling and Aggregating Framework for Joint Extraction of Entities and Relations

Authors: Yao Wang, Xin Liu, Weikun Kong, Hai-Tao Yu, Teeradaj Racharak, Kyoung-Sook Kim, Minh Le Nguyen

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Named Entity Recognition and Relation Extraction are two crucial and challenging subtasks in the field of Information Extraction. Despite the successes achieved by the traditional approaches, fundamental research questions remain open. First, most recent studies use parameter sharing for a single subtask or shared features for both two subtasks, ignoring their semantic differences. Second, information interaction mainly focuses on the two subtasks, leaving the fine-grained informtion interaction among the subtask-specific features of encoding subjects, relations, and objects unexplored. Motivated by the aforementioned limitations, we propose a novel model to jointly extract entities and relations. The main novelties are as follows: (1) We propose to decouple the feature encoding process into three parts, namely encoding subjects, encoding objects, and encoding relations. Thanks to this, we are able to use fine-grained subtask-specific features. (2) We propose novel inter-aggregation and intra-aggregation strategies to enhance the information interaction and construct individual fine-grained subtask-specific features, respectively. The experimental results demonstrate that our model outperforms several previous state-of-the-art models. Extensive additional experiments further confirm the effectiveness of our model.
[13] arXiv:2405.08317 [pdf, other]: Title: SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

Authors: Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Comments: 9+6 pages, Submitted to ACL 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we design algorithms that can generate adversarial examples to jailbreak SLMs in both white-box and black-box attack settings without human involvement. Additionally, we propose countermeasures to thwart such jailbreaking attacks. Our models, trained on dialog data with speech instructions, achieve state-of-the-art performance on spoken question-answering task, scoring over 80% on both safety and helpfulness metrics. Despite safety guardrails, experiments on jailbreaking demonstrate the vulnerability of SLMs to adversarial perturbations and transfer attacks, with average attack success rates of 90% and 10% respectively when evaluated on a dataset of carefully designed harmful questions spanning 12 different toxic categories. However, we demonstrate that our proposed countermeasures reduce the attack success significantly.
[14] arXiv:2405.08355 [pdf, other]: Title: Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark

Authors: Mengsong Wu, Tong Zhu, Han Han, Chuanyuan Tan, Xiang Zhang, Wenliang Chen

Comments: 14 pages, 10 figures

Subjects: Computation and Language (cs.CL)

This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-like tools. Seal-Tools not only offers a large number of tools, but also includes instances which demonstrate the practical application of tools. Seeking to generate data on a large scale while ensuring reliability, we propose a self-instruct method to generate tools and instances, allowing precise control over the process. Moreover, our Seal-Tools contains hard instances that call multiple tools to complete the job, among which some are nested tool callings. For precise and comprehensive evaluation, we use strict format control and design three metrics from different dimensions. Therefore, Seal-Tools can serve as a new benchmark to evaluate the tool-calling ability of LLMs. Finally, we evaluate several prevalent LLMs and our finetuned model on Seal-Tools. The results show that current systems are far from perfect. The code, data and experiment results are available at https://github.com/fairyshine/Seal-Tools .
[15] arXiv:2405.08373 [pdf, other]: Title: PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM Ensembles

Authors: Satya Kesav Gundabathula, Sriram R Kolar

Comments: Paper accepted for oral presentation at Clinical NLP workshop, NAACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper describes our approach to the MEDIQA-CORR shared task, which involves error detection and correction in clinical notes curated by medical professionals. This task involves handling three subtasks: detecting the presence of errors, identifying the specific sentence containing the error, and correcting it. Through our work, we aim to assess the capabilities of Large Language Models (LLMs) trained on a vast corpora of internet data that contain both factual and unreliable information. We propose to comprehensively address all subtasks together, and suggest employing a unique prompt-based in-context learning strategy. We will evaluate its efficacy in this specialized task demanding a combination of general reasoning and medical knowledge. In medical systems where prediction errors can have grave consequences, we propose leveraging self-consistency and ensemble methods to enhance error correction and error detection performance.
[16] arXiv:2405.08400 [pdf, other]: Title: Stylometric Watermarks for Large Language Models

Authors: Georg Niess, Roman Kern

Comments: 19 pages, 4 figures, 9 tables

Subjects: Computation and Language (cs.CL)

The rapid advancement of large language models (LLMs) has made it increasingly difficult to distinguish between text written by humans and machines. Addressing this, we propose a novel method for generating watermarks that strategically alters token probabilities during generation. Unlike previous works, this method uniquely employs linguistic features such as stylometry. Concretely, we introduce acrostica and sensorimotor norms to LLMs. Further, these features are parameterized by a key, which is updated every sentence. To compute this key, we use semantic zero shot classification, which enhances resilience. In our evaluation, we find that for three or more sentences, our method achieves a false positive and false negative rate of 0.02. For the case of a cyclic translation attack, we observe similar results for seven or more sentences. This research is of particular of interest for proprietary LLMs to facilitate accountability and prevent societal harm.
[17] arXiv:2405.08402 [pdf, other]: Title: Investigating the 'Autoencoder Behavior' in Speech Self-Supervised Models: a focus on HuBERT's Pretraining

Authors: Valentin Vielzeuf

Subjects: Computation and Language (cs.CL)

Self-supervised learning has shown great success in Speech Recognition. However, it has been observed that finetuning all layers of the learned model leads to lower performance compared to resetting top layers. This phenomenon is attributed to the ''autoencoder'' behavior: top layers contain information closer to the input and are less suitable for tasks that require linguistic information, such as Speech Recognition.To better our understanding of this behavior, we propose to study the evolution of high-level information within the model during pretraining. We focus on the HuBERT model, which exhibits a less pronounced ''autoencoder'' behavior. By experimentally exploring various factors that may have an impact, we aim to improve the training procedure and enhance the top layers of HuBERT for high-level tasks.Furthermore, our experiments demonstrate that these improvements in the training procedure result in faster convergence and competitive performance on downstream tasks.
[18] arXiv:2405.08427 [pdf, other]: Title: Impact of Stickers on Multimodal Chat Sentiment Analysis and Intent Recognition: A New Task, Dataset and Baseline

Authors: Yuanchen Shi, Biao Ma, Fang Kong

Comments: 10 pages, 6 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Stickers are increasingly used in social media to express sentiment and intent. When finding typing troublesome, people often use a sticker instead. Despite the significant impact of stickers on sentiment analysis and intent recognition, little research has been conducted. To address this gap, we propose a new task: Multimodal chat Sentiment Analysis and Intent Recognition involving Stickers (MSAIRS). Additionally, we introduce a novel multimodal dataset containing Chinese chat records and stickers excerpted from several mainstream social media platforms. Our dataset includes paired data with the same text but different stickers, and various stickers consisting of the same images with different texts, allowing us to better understand the impact of stickers on chat sentiment and intent. We also propose an effective multimodal joint model, MMSAIR, for our task, which is validated on our datasets and indicates that visual information of stickers counts. Our dataset and code will be publicly available.
[19] arXiv:2405.08454 [pdf, other]: Title: How Alignment Helps Make the Most of Multimodal Data

Authors: Christian Arnold, Andreas Küpfer

Comments: Working Paper

Subjects: Computation and Language (cs.CL)

When studying political communication, combining the information from text, audio, and video signals promises to reflect the richness of human communication more comprehensively than confining it to individual modalities alone. However, when modeling such multimodal data, its heterogeneity, connectedness, and interaction are challenging to address. We argue that aligning the respective modalities can be an essential step in entirely using the potential of multimodal data because it informs the model with human understanding. Exploring aligned modalities unlocks promising analytical leverage. First, it allows us to make the most of information in the data, which inter alia opens the door to better quality predictions. Second, it is possible to answer research questions that span multiple modalities with cross-modal queries. Finally, alignment addresses concerns about model interpretability. We illustrate the utility of this approach by analyzing how German MPs address members of the far-right AfD in their speeches, and predicting the tone of video advertising in the context of the 2020 US presidential race. Our paper offers important insights to all keen to analyze multimodal data effectively.
[20] arXiv:2405.08460 [pdf, other]: Title: Evaluating LLMs at Evaluating Temporal Generalization

Authors: Chenghao Zhu, Nuo Chen, Yufei Gao, Benyou Wang

Comments: Preprint

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The rapid advancement of Large Language Models (LLMs) highlights the urgent need for evolving evaluation methodologies that keep pace with improvements in language comprehension and information processing. However, traditional benchmarks, which are often static, fail to capture the continually changing information landscape, leading to a disparity between the perceived and actual effectiveness of LLMs in ever-changing real-world scenarios. Furthermore, these benchmarks do not adequately measure the models' capabilities over a broader temporal range or their adaptability over time. We examine current LLMs in terms of temporal generalization and bias, revealing that various temporal biases emerge in both language likelihood and prognostic prediction. This serves as a caution for LLM practitioners to pay closer attention to mitigating temporal biases. Also, we propose an evaluation framework Freshbench for dynamically generating benchmarks from the most recent real-world prognostication prediction. Our code is available at https://github.com/FreedomIntelligence/FreshBench. The dataset will be released soon.
[21] arXiv:2405.08468 [pdf, other]: Title: Challenges and Opportunities in Text Generation Explainability

Authors: Kenza Amara, Rita Sevastjanova, Mennatallah El-Assady

Comments: 17 pages, 5 figures, xAI-2024 Conference, Main track

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The necessity for interpretability in natural language processing (NLP) has risen alongside the growing prominence of large language models. Among the myriad tasks within NLP, text generation stands out as a primary objective of autoregressive models. The NLP community has begun to take a keen interest in gaining a deeper understanding of text generation, leading to the development of model-agnostic explainable artificial intelligence (xAI) methods tailored to this task. The design and evaluation of explainability methods are non-trivial since they depend on many factors involved in the text generation process, e.g., the autoregressive model and its stochastic nature. This paper outlines 17 challenges categorized into three groups that arise during the development and assessment of attribution-based explainability methods. These challenges encompass issues concerning tokenization, defining explanation similarity, determining token importance and prediction change metrics, the level of human intervention required, and the creation of suitable test datasets. The paper illustrates how these challenges can be intertwined, showcasing new opportunities for the community. These include developing probabilistic word-level explainability methods and engaging humans in the explainability pipeline, from the data design to the final evaluation, to draw robust conclusions on xAI methods.
[22] arXiv:2405.08469 [pdf, other]: Title: GPT-3.5 for Grammatical Error Correction

Authors: Anisia Katinskaia, Roman Yangarber

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper investigates the application of GPT-3.5 for Grammatical Error Correction (GEC) in multiple languages in several settings: zero-shot GEC, fine-tuning for GEC, and using GPT-3.5 to re-rank correction hypotheses generated by other GEC models. In the zero-shot setting, we conduct automatic evaluations of the corrections proposed by GPT-3.5 using several methods: estimating grammaticality with language models (LMs), the Scribendi test, and comparing the semantic embeddings of sentences. GPT-3.5 has a known tendency to over-correct erroneous sentences and propose alternative corrections. For several languages, such as Czech, German, Russian, Spanish, and Ukrainian, GPT-3.5 substantially alters the source sentences, including their semantics, which presents significant challenges for evaluation with reference-based metrics. For English, GPT-3.5 demonstrates high recall, generates fluent corrections, and generally preserves sentence semantics. However, human evaluation for both English and Russian reveals that, despite its strong error-detection capabilities, GPT-3.5 struggles with several error types, including punctuation mistakes, tense errors, syntactic dependencies between words, and lexical compatibility at the sentence level.
[23] arXiv:2405.08477 [pdf, other]: Title: Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models

Authors: Andrea Piergentili, Beatrice Savoldi, Matteo Negri, Luisa Bentivogli

Comments: Accepted at EAMT 2024

Subjects: Computation and Language (cs.CL)

Machine translation (MT) models are known to suffer from gender bias, especially when translating into languages with extensive gendered morphology. Accordingly, they still fall short in using gender-inclusive language, also representative of non-binary identities. In this paper, we look at gender-inclusive neomorphemes, neologistic elements that avoid binary gender markings as an approach towards fairer MT. In this direction, we explore prompting techniques with large language models (LLMs) to translate from English into Italian using neomorphemes. So far, this area has been under-explored due to its novelty and the lack of publicly available evaluation resources. We fill this gap by releasing Neo-GATE, a resource designed to evaluate gender-inclusive en-it translation with neomorphemes. With Neo-GATE, we assess four LLMs of different families and sizes and different prompt formats, identifying strengths and weaknesses of each on this novel task for MT.
[24] arXiv:2405.08497 [pdf, other]: Title: Is Less More? Quality, Quantity and Context in Idiom Processing with Natural Language Models

Authors: Agne Knietaite, Adam Allsebrook, Anton Minkov, Adam Tomaszewski, Norbert Slinko, Richard Johnson, Thomas Pickard, Dylan Phelps, Aline Villavicencio

Comments: 14 pages, 10 figures. Presented at the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD 2024) this https URL

Subjects: Computation and Language (cs.CL)

Compositionality in language models presents a problem when processing idiomatic expressions, as their meaning often cannot be directly derived from their individual parts. Although fine-tuning and other optimization strategies can be used to improve representations of idiomatic expressions, this depends on the availability of relevant data. We present the Noun Compound Synonym Substitution in Books - NCSSB - datasets, which are created by substitution of synonyms of potentially idiomatic English noun compounds in public domain book texts. We explore the trade-off between data quantity and quality when training models for idiomaticity detection, in conjunction with contextual information obtained locally (from the surrounding sentences) or externally (through language resources). Performance on an idiomaticity detection task indicates that dataset quality is a stronger factor for context-enriched models, but that quantity also plays a role in models without context inclusion strategies.
[25] arXiv:2405.08502 [pdf, other]: Title: Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure

Authors: Odysseas S. Chlapanis, Ion Androutsopoulos, Dimitrios Galanis

Comments: To be published in SemEval-2024

Subjects: Computation and Language (cs.CL)

The SemEval task on Argument Reasoning in Civil Procedure is challenging in that it requires understanding legal concepts and inferring complex arguments. Currently, most Large Language Models (LLM) excelling in the legal realm are principally purposed for classification tasks, hence their reasoning rationale is subject to contention. The approach we advocate involves using a powerful teacher-LLM (ChatGPT) to extend the training dataset with explanations and generate synthetic data. The resulting data are then leveraged to fine-tune a small student-LLM. Contrary to previous work, our explanations are not directly derived from the teacher's internal knowledge. Instead they are grounded in authentic human analyses, therefore delivering a superior reasoning signal. Additionally, a new `mutation' method generates artificial data instances inspired from existing ones. We are publicly releasing the explanations as an extension to the original dataset, along with the synthetic dataset and the prompts that were used to generate both. Our system ranked 15th in the SemEval competition. It outperforms its own teacher and can produce explanations aligned with the original human analyses, as verified by legal experts.
[26] arXiv:2405.08546 [pdf, other]: Title: Analysing Cross-Speaker Convergence in Face-to-Face Dialogue through the Lens of Automatically Detected Shared Linguistic Constructions

Authors: Esam Ghaleb, Marlou Rasenberg, Wim Pouw, Ivan Toni, Judith Holler, Aslı Özyürek, Raquel Fernández

Comments: Accepted for publication at the 46th Proceedings of the Annual Meeting of the Cognitive Science Society

Subjects: Computation and Language (cs.CL)

Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions -- expressions with a common lexical core used by both speakers within a dialogue -- and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue.
[27] arXiv:2405.08562 [pdf, other]: Title: The Unseen Targets of Hate -- A Systematic Review of Hateful Communication Datasets

Authors: Zehui Yu, Indira Sen, Dennis Assenmacher, Mattia Samory, Leon Fröhling, Christina Dahn, Debora Nozza, Claudia Wagner

Comments: 20 pages, 14 figures

Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

Machine learning (ML)-based content moderation tools are essential to keep online spaces free from hateful communication. Yet, ML tools can only be as capable as the quality of the data they are trained on allows them. While there is increasing evidence that they underperform in detecting hateful communications directed towards specific identities and may discriminate against them, we know surprisingly little about the provenance of such bias. To fill this gap, we present a systematic review of the datasets for the automated detection of hateful communication introduced over the past decade, and unpack the quality of the datasets in terms of the identities that they embody: those of the targets of hateful communication that the data curators focused on, as well as those unintentionally included in the datasets. We find, overall, a skewed representation of selected target identities and mismatches between the targets that research conceptualizes and ultimately includes in datasets. Yet, by contextualizing these findings in the language and location of origin of the datasets, we highlight a positive trend towards the broadening and diversification of this research space.
[28] arXiv:2405.08570 [pdf, other]: Title: Rethinking the adaptive relationship between Encoder Layers and Decoder Layers

Authors: Yubo Song

Subjects: Computation and Language (cs.CL)

This article explores the adaptive relationship between Encoder Layers and Decoder Layers using the SOTA model Helsinki-NLP/opus-mt-de-en, which translates German to English. The specific method involves introducing a bias-free fully connected layer between the Encoder and Decoder, with different initializations of the layer's weights, and observing the outcomes of fine-tuning versus retraining. Four experiments were conducted in total. The results suggest that directly modifying the pre-trained model structure for fine-tuning yields suboptimal performance. However, upon observing the outcomes of the experiments with retraining, this structural adjustment shows significant potential.
[29] arXiv:2405.08603 [pdf, other]: Title: A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine

Authors: Hanguang Xiao, Feizhong Zhou, Xingyue Liu, Tianqi Liu, Zhipeng Li, Xin Liu, Xiaoxuan Huang

Subjects: Computation and Language (cs.CL)

Since the release of ChatGPT and GPT-4, large language models (LLMs) and multimodal large language models (MLLMs) have garnered significant attention due to their powerful and general capabilities in understanding, reasoning, and generation, thereby offering new paradigms for the integration of artificial intelligence with medicine. This survey comprehensively overviews the development background and principles of LLMs and MLLMs, as well as explores their application scenarios, challenges, and future directions in medicine. Specifically, this survey begins by focusing on the paradigm shift, tracing the evolution from traditional models to LLMs and MLLMs, summarizing the model structures to provide detailed foundational knowledge. Subsequently, the survey details the entire process from constructing and evaluating to using LLMs and MLLMs with a clear logic. Following this, to emphasize the significant value of LLMs and MLLMs in healthcare, we survey and summarize 6 promising applications in healthcare. Finally, the survey discusses the challenges faced by medical LLMs and MLLMs and proposes a feasible approach and direction for the subsequent integration of artificial intelligence with medicine. Thus, this survey aims to provide researchers with a valuable and comprehensive reference guide from the perspectives of the background, principles, and clinical applications of LLMs and MLLMs.
[30] arXiv:2405.08619 [pdf, other]: Title: ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation

Authors: Dimitris Gkoumas

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)

The field of chemistry and Artificial Intelligence (AI) intersection is an area of active research that aims to accelerate scientific discovery. The integration of large language models (LLMs) with scientific modalities has shown significant promise in this endeavour. However, challenges persist in effectively addressing training efficacy and the out-of-distribution problem, particularly as existing approaches rely on larger models and datasets. In this context, we focus on machine language-molecule translation and deploy a novel training approach called contrastive preference optimisation, which avoids generating translations that are merely adequate but not perfect. To ensure generalisability and mitigate memorisation effects, we conduct experiments using only 10\% of the data. Our results demonstrate that our models achieve up to a 32\% improvement compared to counterpart models. We also introduce a scalable fine-grained evaluation methodology that accommodates responsibility.
[31] arXiv:2405.08644 [pdf, other]: Title: Thinking Tokens for Language Modeling

Authors: David Herel, Tomas Mikolov

Comments: AITP 2023 (May 10, 2023)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

How much is 56 times 37? Language models often make mistakes in these types of difficult calculations. This is usually explained by their inability to perform complex reasoning. Since language models rely on large training sets and great memorization capability, naturally they are not equipped to run complex calculations. However, one can argue that humans also cannot perform this calculation immediately and require a considerable amount of time to construct the solution. In order to enhance the generalization capability of language models, and as a parallel to human behavior, we propose to use special 'thinking tokens' which allow the model to perform much more calculations whenever a complex problem is encountered.
[32] arXiv:2405.08729 [pdf, other]: Title: Targeted Augmentation for Low-Resource Event Extraction

Authors: Sijia Wang, Lifu Huang

Comments: 15 pages, NAACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Addressing the challenge of low-resource information extraction remains an ongoing issue due to the inherent information scarcity within limited training examples. Existing data augmentation methods, considered potential solutions, struggle to strike a balance between weak augmentation (e.g., synonym augmentation) and drastic augmentation (e.g., conditional generation without proper guidance). This paper introduces a novel paradigm that employs targeted augmentation and back validation to produce augmented examples with enhanced diversity, polarity, accuracy, and coherence. Extensive experimental results demonstrate the effectiveness of the proposed paradigm. Furthermore, identified limitations are discussed, shedding light on areas for future improvement.
[33] arXiv:2405.08751 [pdf, other]: Title: From Text to Context: An Entailment Approach for News Stakeholder Classification

Authors: Alapan Kuila, Sudeshna Sarkar

Comments: Accepted in SIGIR 2024

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Navigating the complex landscape of news articles involves understanding the various actors or entities involved, referred to as news stakeholders. These stakeholders, ranging from policymakers to opposition figures, citizens, and more, play pivotal roles in shaping news narratives. Recognizing their stakeholder types, reflecting their roles, political alignments, social standing, and more, is paramount for a nuanced comprehension of news content. Despite existing works focusing on salient entity extraction, coverage variations, and political affiliations through social media data, the automated detection of stakeholder roles within news content remains an underexplored domain. In this paper, we bridge this gap by introducing an effective approach to classify stakeholder types in news articles. Our method involves transforming the stakeholder classification problem into a natural language inference task, utilizing contextual information from news articles and external knowledge to enhance the accuracy of stakeholder type detection. Moreover, our proposed model showcases efficacy in zero-shot settings, further extending its applicability to diverse news contexts.
[34] arXiv:2405.08760 [pdf, other]: Title: Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs

Authors: Akhila Yerukola, Saujas Vaduguru, Daniel Fried, Maarten Sap

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Humans often express their communicative intents indirectly or non-literally, which requires their interlocutors -- human or AI -- to understand beyond the literal meaning of words. While most existing work has focused on discriminative evaluations, we present a new approach to generatively evaluate large language models' (LLMs') intention understanding by examining their responses to non-literal utterances. Ideally, an LLM should respond in line with the true intention of a non-literal utterance, not its literal interpretation. Our findings show that LLMs struggle to generate pragmatically relevant responses to non-literal language, achieving only 50-55% accuracy on average. While explicitly providing oracle intentions significantly improves performance (e.g., 75% for Mistral-Instruct), this still indicates challenges in leveraging given intentions to produce appropriate responses. Using chain-of-thought to make models spell out intentions yields much smaller gains (60% for Mistral-Instruct). These findings suggest that LLMs are not yet effective pragmatic interlocutors, highlighting the need for better approaches for modeling intentions and utilizing them for pragmatic generation.
[35] arXiv:2405.08784 [pdf, other]: Title: Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram

Authors: Aehong Min, Xuan Wang, Rion Brattig Correia, Jordan Rozum, Wendy R. Miller, Luis M. Rocha

Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. OpenAI's GPT series models were compared against human annotation. Frequent terms with a high false-positive rate were removed from the dictionary. Analysis of the estimated false-positive rates of the annotated terms revealed 8 ambiguous terms (plus synonyms) used in Instagram posts, which were removed from the original dictionary. To study the effect of removing those terms, we constructed knowledge networks using the refined and the original dictionaries and performed an eigenvector-centrality analysis on both networks. We show that the refined dictionary thus produced leads to a significantly different rank of important terms, as measured by their eigenvector-centrality of the knowledge networks. Furthermore, the most important terms obtained after refinement are of greater medical relevance. In addition, we show that OpenAI's GPT series models fare worse than human annotators in this task.

Cross-lists for Wed, 15 May 24

[36] arXiv:2405.08014 (cross-list from cs.RO) [pdf, ps, other]: Title: Robot Detection System 1: Front-Following

Authors: Jinwei Lin

Comments: paper series

Subjects: Robotics (cs.RO); Computation and Language (cs.CL)

Front-following is more technically difficult to implement than the other two human following technologies, but front-following technology is more practical and can be applied in more areas to solve more practical problems. Front-following technology has many advantages not found in back-following and side-by-side technologies. In this paper, we will discuss basic and significant principles and general design idea of this technology. Besides, various of novel and special useful methods will be presented and provided. We use enough beautiful figures to display our novel design idea. Our research result is open source in 2018, and this paper is just to expand the research result propagation granularity. Abundant magic design idea are included in this paper, more idea and analyzing can sear and see other paper naming with a start of Robot Design System with Jinwei Lin, the only author of this series papers.
[37] arXiv:2405.08017 (cross-list from cs.LG) [pdf, ps, other]: Title: Translating Expert Intuition into Quantifiable Features: Encode Investigator Domain Knowledge via LLM for Enhanced Predictive Analytics

Authors: Phoebe Jing, Yijing Gao, Yuanhang Zhang, Xianlong Zeng

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

In the realm of predictive analytics, the nuanced domain knowledge of investigators often remains underutilized, confined largely to subjective interpretations and ad hoc decision-making. This paper explores the potential of Large Language Models (LLMs) to bridge this gap by systematically converting investigator-derived insights into quantifiable, actionable features that enhance model performance. We present a framework that leverages LLMs' natural language understanding capabilities to encode these red flags into a structured feature set that can be readily integrated into existing predictive models. Through a series of case studies, we demonstrate how this approach not only preserves the critical human expertise within the investigative process but also scales the impact of this knowledge across various prediction tasks. The results indicate significant improvements in risk assessment and decision-making accuracy, highlighting the value of blending human experiential knowledge with advanced machine learning techniques. This study paves the way for more sophisticated, knowledge-driven analytics in fields where expert insight is paramount.
[38] arXiv:2405.08032 (cross-list from cs.HC) [pdf, ps, other]: Title: Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design

Authors: Peer-Olaf Siebers

Comments: 29 pages, 3 figures, 1 table

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)

ChatGPT, the AI-powered chatbot with a massive user base of hundreds of millions, has become a global phenomenon. However, the use of Conversational AI Systems (CAISs) like ChatGPT for research in the field of Social Simulation is still limited. Specifically, there is no evidence of its usage in Agent-Based Social Simulation (ABSS) model design. While scepticism towards anything new is inherent to human nature, we firmly believe it is imperative to initiate the use of this innovative technology to support ABSS model design. This paper presents a proof-of-concept that demonstrates how CAISs can facilitate the development of innovative conceptual ABSS models in a concise timeframe and with minimal required upfront case-based knowledge. By employing advanced prompt engineering techniques and adhering to the Engineering ABSS framework, we have constructed a comprehensive prompt script that enables the design of ABSS models with or by the CAIS. The effectiveness of the script is demonstrated through an illustrative case study concerning the use of adaptive architecture in museums. Despite occasional inaccuracies and divergences in conversation, the CAIS proved to be a valuable companion for ABSS modellers.
[39] arXiv:2405.08209 (cross-list from cs.CY) [pdf, other]: Title: Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp

Authors: Rachel Hong, William Agnew, Tadayoshi Kohno, Jamie Morgenstern

Comments: Content warning: This paper discusses societal stereotypes and sexually-explicit material that may be disturbing, distressing, and/or offensive to the reader

Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

As training datasets become increasingly drawn from unstructured, uncontrolled environments such as the web, researchers and industry practitioners have increasingly relied upon data filtering techniques to "filter out the noise" of web-scraped data. While datasets have been widely shown to reflect the biases and values of their creators, in this paper we contribute to an emerging body of research that assesses the filters used to create these datasets. We show that image-text data filtering also has biases and is value-laden, encoding specific notions of what is counted as "high-quality" data. In our work, we audit a standard approach of image-text CLIP-filtering on the academic benchmark DataComp's CommonPool by analyzing discrepancies of filtering through various annotation techniques across multiple modalities of image, text, and website source. We find that data relating to several imputed demographic groups -- such as LGBTQ+ people, older women, and younger men -- are associated with higher rates of exclusion. Moreover, we demonstrate cases of exclusion amplification: not only are certain marginalized groups already underrepresented in the unfiltered data, but CLIP-filtering excludes data from these groups at higher rates. The data-filtering step in the machine learning pipeline can therefore exacerbate representation disparities already present in the data-gathering step, especially when existing filters are designed to optimize a specifically-chosen downstream performance metric like zero-shot image classification accuracy. Finally, we show that the NSFW filter fails to remove sexually-explicit content from CommonPool, and that CLIP-filtering includes several categories of copyrighted content at high rates. Our conclusions point to a need for fundamental changes in dataset creation and filtering practices.
[40] arXiv:2405.08238 (cross-list from cs.HC) [pdf, other]: Title: Silver-Tongued and Sundry: Exploring Intersectional Pronouns with ChatGPT

Authors: Takao Fujii, Katie Seaborn, Madeleine Steeds

Comments: Honorable Mention award (top 5%) at CHI '24

Journal-ref: CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems (2024), Article No. 511, 1-14

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

ChatGPT is a conversational agent built on a large language model. Trained on a significant portion of human output, ChatGPT can mimic people to a degree. As such, we need to consider what social identities ChatGPT simulates (or can be designed to simulate). In this study, we explored the case of identity simulation through Japanese first-person pronouns, which are tightly connected to social identities in intersectional ways, i.e., intersectional pronouns. We conducted a controlled online experiment where people from two regions in Japan (Kanto and Kinki) witnessed interactions with ChatGPT using ten sets of first-person pronouns. We discovered that pronouns alone can evoke perceptions of social identities in ChatGPT at the intersections of gender, age, region, and formality, with caveats. This work highlights the importance of pronoun use for social identity simulation, provides a language-based methodology for culturally-sensitive persona development, and advances the potential of intersectional identities in intelligent agents.
[41] arXiv:2405.08514 (cross-list from cs.LG) [pdf, ps, other]: Title: Falcon 7b for Software Mention Detection in Scholarly Documents

Authors: AmeerAli Khan, Qusai Ramadan, Cong Yang, Zeyd Boukhers

Comments: Accepted for publication by the first Workshop on Natural Scientific Language Processing and Research Knowledge Graphs - NSLP (@ ESCAI)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Digital Libraries (cs.DL)

This paper aims to tackle the challenge posed by the increasing integration of software tools in research across various disciplines by investigating the application of Falcon-7b for the detection and classification of software mentions within scholarly texts. Specifically, the study focuses on solving Subtask I of the Software Mention Detection in Scholarly Publications (SOMD), which entails identifying and categorizing software mentions from academic literature. Through comprehensive experimentation, the paper explores different training strategies, including a dual-classifier approach, adaptive sampling, and weighted loss scaling, to enhance detection accuracy while overcoming the complexities of class imbalance and the nuanced syntax of scholarly writing. The findings highlight the benefits of selective labelling and adaptive sampling in improving the model's performance. However, they also indicate that integrating multiple strategies does not necessarily result in cumulative improvements. This research offers insights into the effective application of large language models for specific tasks such as SOMD, underlining the importance of tailored approaches to address the unique challenges presented by academic text analysis.
[42] arXiv:2405.08553 (cross-list from cs.LG) [pdf, other]: Title: Improving Transformers with Dynamically Composable Multi-Head Attention

Authors: Da Xiao, Qingye Meng, Shengping Li, Xingyuan Yuan

Comments: Accepted to the 41th International Conference on Machine Learning (ICML'24)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Multi-Head Attention (MHA) is a key component of Transformer. In MHA, attention heads work independently, causing problems such as low-rank bottleneck of attention score matrices and head redundancy. We propose Dynamically Composable Multi-Head Attention (DCMHA), a parameter and computation efficient attention architecture that tackles the shortcomings of MHA and increases the expressive power of the model by dynamically composing attention heads. At the core of DCMHA is a $\it{Compose}$ function that transforms the attention score and weight matrices in an input-dependent way. DCMHA can be used as a drop-in replacement of MHA in any transformer architecture to obtain the corresponding DCFormer. DCFormer significantly outperforms Transformer on different architectures and model scales in language modeling, matching the performance of models with ~1.7x-2.0x compute. For example, DCPythia-6.9B outperforms open source Pythia-12B on both pretraining perplexity and downstream task evaluation. The code and models are available at https://github.com/Caiyun-AI/DCFormer.

Replacements for Wed, 15 May 24

[43] arXiv:2303.07247 (replaced) [pdf, ps, other]: Title: Are Models Trained on Indian Legal Data Fair?

Authors: Sahil Girhepuje, Anmol Goel, Gokul S Krishnan, Shreya Goyal, Satyendra Pandey, Ponnurangam Kumaraguru, Balaraman Ravindran

Comments: Presented at the Symposium on AI and Law (SAIL) 2023

Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
[44] arXiv:2309.05950 (replaced) [pdf, other]: Title: Language Models as Black-Box Optimizers for Vision-Language Models

Authors: Shihong Liu, Zhiqiu Lin, Samuel Yu, Ryan Lee, Tiffany Ling, Deepak Pathak, Deva Ramanan

Comments: Published at CVPR 2024. Project site: this https URL

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[45] arXiv:2310.13206 (replaced) [pdf, other]: Title: Primacy Effect of ChatGPT

Authors: Yiwei Wang, Yujun Cai, Muhao Chen, Yuxuan Liang, Bryan Hooi

Comments: EMNLP 2023 short paper

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[46] arXiv:2310.14558 (replaced) [pdf, other]: Title: AlpaCare:Instruction-tuned Large Language Models for Medical Application

Authors: Xinlu Zhang, Chenxin Tian, Xianjun Yang, Lichang Chen, Zekun Li, Linda Ruth Petzold

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[47] arXiv:2311.17330 (replaced) [pdf, ps, other]: Title: Biomedical knowledge graph-optimized prompt generation for large language models

Authors: Karthik Soman, Peter W Rose, John H Morris, Rabia E Akbas, Brett Smith, Braian Peetoom, Catalina Villouta-Reyes, Gabriel Cerono, Yongmei Shi, Angela Rizk-Jackson, Sharat Israni, Charlotte A Nelson, Sui Huang, Sergio E Baranzini

Comments: 29 pages, 5 figures, 1 table, 1 supplementary file

Subjects: Computation and Language (cs.CL)
[48] arXiv:2402.02807 (replaced) [pdf, other]: Title: Are Sounds Sound for Phylogenetic Reconstruction?

Authors: Luise Häuser, Gerhard Jäger, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis

Comments: Paper accepted for SIGTYP (2024): H\"auser, Luise; J\"ager, Gerhard; List, Johann-Mattis; Rama, Taraka; and Stamatakis, Alexandros (2024): Are sounds sound for phylogenetic reconstruction? In: Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (SIGTYP 2024)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49] arXiv:2403.01139 (replaced) [pdf, other]: Title: ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies

Authors: Oren Sultan, Yonatan Bitton, Ron Yosef, Dafna Shahaf

Comments: NAACL 2024 (Main Conference)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[50] arXiv:2403.10799 (replaced) [pdf, other]: Title: Efficient Pruning of Large Language Model with Adaptive Estimation Fusion

Authors: Jun Liu, Chao Wu, Changdi Yang, Hao Tang, Haoye Dong, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang, Yanzhi Wang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[51] arXiv:2403.12024 (replaced) [pdf, other]: Title: Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems

Authors: Bo-Han Lu, Yi-Hsuan Lin, En-Shiun Annie Lee, Richard Tzong-Han Tsai

Comments: Accepted by LREC-COLING 2024 as a long oral paper

Subjects: Computation and Language (cs.CL)
[52] arXiv:2403.15436 (replaced) [pdf, ps, other]: Title: Using Contextual Information for Sentence-level Morpheme Segmentation

Authors: Prabin Bhandari, Abhishek Paudel

Comments: 5 pages, 3 tables

Subjects: Computation and Language (cs.CL)
[53] arXiv:2404.08760 (replaced) [pdf, other]: Title: The Generation Gap:Exploring Age Bias in the Underlying Value Systems of Large Language Models

Authors: Siyang Liu, Trish Maturi, Bowen Yi, Siqi Shen, Rada Mihalcea

Comments: 4 pages

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[54] arXiv:2404.12241 (replaced) [pdf, other]: Title: Introducing v0.5 of the AI Safety Benchmark from MLCommons

Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse Khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Srijan Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, et al. (49 additional authors not shown)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[55] arXiv:2404.13906 (replaced) [pdf, other]: Title: Generating Attractive and Authentic Copywriting from Customer Reviews

Authors: Yu-Xiang Lin, Wei-Yun Ma

Comments: NAACL 2024 main conference paper

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[56] arXiv:2404.15565 (replaced) [pdf, other]: Title: CASPR: Automated Evaluation Metric for Contrastive Summarization

Authors: Nirupan Ananthamurugan, Dat Duong, Philip George, Ankita Gupta, Sandeep Tata, Beliz Gunel

Subjects: Computation and Language (cs.CL)
[57] arXiv:2404.17481 (replaced) [pdf, other]: Title: ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations

Authors: Tyler Loakman, Chenghua Lin

Comments: Accepted to HumEval at LREC-Coling 2024. Table 1 updated

Subjects: Computation and Language (cs.CL)
[58] arXiv:2405.02501 (replaced) [pdf, other]: Title: PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning

Authors: Hyeong Kyu Choi, Yixuan Li

Comments: ICML 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[59] arXiv:2405.03548 (replaced) [pdf, other]: Title: MAmmoTH2: Scaling Instructions from the Web

Authors: Xiang Yue, Tuney Zheng, Ge Zhang, Wenhu Chen

Comments: Work in Progress

Subjects: Computation and Language (cs.CL)
[60] arXiv:2405.04515 (replaced) [pdf, other]: Title: A Transformer with Stack Attention

Authors: Jiaoda Li, Jennifer C. White, Mrinmaya Sachan, Ryan Cotterell

Comments: NAACL 2024 Findings

Subjects: Computation and Language (cs.CL)
[61] arXiv:2405.06067 (replaced) [pdf, other]: Title: HMT: Hierarchical Memory Transformer for Long Context Language Processing

Authors: Zifan He, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[62] arXiv:2405.06714 (replaced) [pdf, other]: Title: Towards a Path Dependent Account of Category Fluency

Authors: David Heineman, Reba Koenen, Sashank Varma

Comments: To appear at CogSci 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[63] arXiv:2405.07076 (replaced) [pdf, other]: Title: Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language Models

Authors: Edward Y. Chang

Comments: 29 pages, 10 tables, 6 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[64] arXiv:2405.07348 (replaced) [pdf, other]: Title: MedConceptsQA: Open Source Medical Concepts QA Benchmark

Authors: Ofir Ben Shoham, Nadav Rappoport

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[65] arXiv:2405.07703 (replaced) [pdf, other]: Title: OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs trained starting from Llama 2

Authors: Mihai Masala, Denis C. Ilie-Ablachim, Dragos Corlatescu, Miruna Zavelca, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian Rebedea

Subjects: Computation and Language (cs.CL)
[66] arXiv:2405.07932 (replaced) [pdf, other]: Title: PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition

Authors: Ziyang Zhang, Qizhen Zhang, Jakob Foerster

Comments: Accepted at ICML 20224

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[67] arXiv:2402.15151 (replaced) [pdf, other]: Title: Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Authors: Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

Comments: An Erratum was added on the last page of this paper

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[68] arXiv:2403.00858 (replaced) [pdf, other]: Title: Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs

Authors: Raghavv Goel, Mukul Gagrani, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott

Comments: 8 pages, 3 figures, Published at the ICLR 2024 Workshop on Understanding of Foundation Models (ME-FoMo)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[69] arXiv:2403.05535 (replaced) [pdf, other]: Title: Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos

Authors: Tarun Kalluri, Bodhisattwa Prasad Majumder, Manmohan Chandraker

Comments: ICML 2024 Version. Project Page and Code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[70] arXiv:2403.06098 (replaced) [pdf, other]: Title: VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Authors: Wenhao Wang, Yi Yang

Comments: The project (including the collected dataset VidProM and related code) is publicly available at this https URL under the CC-BY-NC 4.0 License

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[71] arXiv:2405.00740 (replaced) [pdf, other]: Title: Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Authors: Samuel Lavoie, Polina Kirichenko, Mark Ibrahim, Mahmoud Assran, Andrew Gordon Wilson, Aaron Courville, Nicolas Ballas

Comments: 14 pages, 8 figures, 7 tables, to be published at ICML2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[72] arXiv:2405.05329 (replaced) [pdf, other]: Title: KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

Authors: Minsik Cho, Mohammad Rastegari, Devang Naik

Comments: preprint for ICML 2024

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[73] arXiv:2405.06725 (replaced) [pdf, other]: Title: On the Shape of Brainscores for Large Language Models (LLMs)

Authors: Jingkai Li

Comments: The Figure 10 from arXiv:1710.04019, Figure 6.28 from arXiv:2403.13825, and captions are both from this https URL, where the case in my paper is Figure 3, and has already cited its original source. I believe both arXiv:1710.04019 and arXiv:2403.13825 should cite the original source, rather than force me to cite them

Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

New submissions
Cross-lists
Replacements

[ total of 73 entries: 1-73 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help (Access key information)

> cs > cs.CL

Computation and Language

New submissions

New submissions for Wed, 15 May 24

Cross-lists for Wed, 15 May 24

Replacements for Wed, 15 May 24