We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computation and Language

New submissions

[ total of 41 entries: 1-41 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 22 Mar 23

[1]  arXiv:2303.11436 [pdf, other]
Title: Mind meets machine: Unravelling GPT-4's cognitive psychology
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Commonsense reasoning is a basic ingredient of intelligence in humans, empowering the ability to deduce conclusions based on the observations of surroundings. Large language models (LLMs) are emerging as potent tools increasingly capable of performing human-level tasks. The recent development in the form of GPT-4 and its demonstrated success in tasks complex to humans such as medical exam, bar exam and others has led to an increased confidence in the LLMs to become perfect instruments of intelligence. Though, the GPT-4 paper has shown performance on some common sense reasoning tasks, a comprehensive assessment of GPT-4 on common sense reasoning tasks, particularly on the existing well-established datasets is missing. In this study, we focus on the evaluation of GPT-4's performance on a set of common sense reasoning questions from the widely used CommonsenseQA dataset along with tools from cognitive psychology. In doing so, we understand how GPT-4 processes and integrates common sense knowledge with contextual information, providing insight into the underlying cognitive processes that enable its ability to generate common sense responses. We show that GPT-4 exhibits a high level of accuracy in answering common sense questions, outperforming its predecessor, GPT-3 and GPT-3.5. We show that the accuracy of GPT-4 on CommonSenseQA is 83 % and it has been shown in the original study that human accuracy over the same data was 89 %. Although, GPT-4 falls short of the human performance, it is a substantial improvement from the original 56.5 % in the original language model used by the CommonSenseQA study. Our results strengthen the already available assessments and confidence on GPT-4's common sense reasoning abilities which have significant potential to revolutionize the field of AI, by enabling machines to bridge the gap between human and machine reasoning.

[2]  arXiv:2303.11504 [pdf, ps, other]
Title: Language Model Behavior: A Comprehensive Survey
Comments: 31 pages
Subjects: Computation and Language (cs.CL)

Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sensitive to specific inputs and surface features. Despite dramatic increases in generated text quality as models scale to hundreds of billions of parameters, the models are still prone to unfactual responses, commonsense errors, memorized text, and social biases. Many of these weaknesses can be framed as over-generalizations or under-generalizations of learned patterns in text. We synthesize recent results to highlight what is currently known about what large language models can and cannot do.

[3]  arXiv:2303.11607 [pdf, other]
Title: Transformers in Speech Processing: A Survey
Comments: under-review
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

The remarkable success of transformers in the field of natural language processing has sparked the interest of the speech-processing community, leading to an exploration of their potential for modeling long-range dependencies within speech sequences. Recently, transformers have gained prominence across various speech-related domains, including automatic speech recognition, speech synthesis, speech translation, speech para-linguistics, speech enhancement, spoken dialogue systems, and numerous multimodal applications. In this paper, we present a comprehensive survey that aims to bridge research studies from diverse subfields within speech technology. By consolidating findings from across the speech technology landscape, we provide a valuable resource for researchers interested in harnessing the power of transformers to advance the field. We identify the challenges encountered by transformers in speech processing while also offering insights into potential solutions to address these issues.

[4]  arXiv:2303.11621 [pdf, other]
Title: Heterogeneous-Branch Collaborative Learning for Dialogue Generation
Comments: Accepted by AAAI 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

With the development of deep learning, advanced dialogue generation methods usually require a greater amount of computational resources. One promising approach to obtaining a high-performance and lightweight model is knowledge distillation, which relies heavily on the pre-trained powerful teacher. Collaborative learning, also known as online knowledge distillation, is an effective way to conduct one-stage group distillation in the absence of a well-trained large teacher model. However, previous work has a severe branch homogeneity problem due to the same training objective and the independent identical training sets. To alleviate this problem, we consider the dialogue attributes in the training of network branches. Each branch learns the attribute-related features based on the selected subset. Furthermore, we propose a dual group-based knowledge distillation method, consisting of positive distillation and negative distillation, to further diversify the features of different branches in a steadily and interpretable way. The proposed approach significantly improves branch heterogeneity and outperforms state-of-the-art collaborative learning methods on two widely used open-domain dialogue datasets.

[5]  arXiv:2303.11660 [pdf, other]
Title: Simple Yet Effective Synthetic Dataset Construction for Unsupervised Opinion Summarization
Comments: EACL 2023 Findings
Subjects: Computation and Language (cs.CL)

Opinion summarization provides an important solution for summarizing opinions expressed among a large number of reviews. However, generating aspect-specific and general summaries is challenging due to the lack of annotated data. In this work, we propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries by training on synthetic datasets constructed with aspect-related review contents. Our first approach, Seed Words Based Leave-One-Out (SW-LOO), identifies aspect-related portions of reviews simply by exact-matching aspect seed words and outperforms existing methods by 3.4 ROUGE-L points on SPACE and 0.5 ROUGE-1 point on OPOSUM+ for aspect-specific opinion summarization. Our second approach, Natural Language Inference Based Leave-One-Out (NLI-LOO) identifies aspect-related sentences utilizing an NLI model in a more general setting without using seed words and outperforms existing approaches by 1.2 ROUGE-L points on SPACE for aspect-specific opinion summarization and remains competitive on other metrics.

[6]  arXiv:2303.11708 [pdf, other]
Title: The Open-domain Paradox for Chatbots: Common Ground as the Basis for Human-like Dialogue
Subjects: Computation and Language (cs.CL)

There is a surge in interest in the development of open-domain chatbots, driven by the recent advancements of large language models. The "openness" of the dialogue is expected to be maximized by providing minimal information to the users about the common ground they can expect, including the presumed joint activity. However, evidence suggests that the effect is the opposite. Asking users to "just chat about anything" results in a very narrow form of dialogue, which we refer to as the "open-domain paradox". In this paper, we explain this paradox through the theory of common ground as the basis for human-like communication. Furthermore, we question the assumptions behind open-domain chatbots and identify paths forward for enabling common ground in human-computer dialogue.

[7]  arXiv:2303.11750 [pdf, other]
Title: LEAPT: Learning Adaptive Prefix-to-prefix Translation For Simultaneous Machine Translation
Comments: Accepted by ICASSP 2023
Subjects: Computation and Language (cs.CL)

Simultaneous machine translation, which aims at a real-time translation, is useful in many live scenarios but very challenging due to the trade-off between accuracy and latency. To achieve the balance for both, the model needs to wait for appropriate streaming text (READ policy) and then generates its translation (WRITE policy). However, WRITE policies of previous work either are specific to the method itself due to the end-to-end training or suffer from the input mismatch between training and decoding for the non-end-to-end training. Therefore, it is essential to learn a generic and better WRITE policy for simultaneous machine translation. Inspired by strategies utilized by human interpreters and "wait" policies, we propose a novel adaptive prefix-to-prefix training policy called LEAPT, which allows our machine translation model to learn how to translate source sentence prefixes and make use of the future context. Experiments show that our proposed methods greatly outperform competitive baselines and achieve promising results.

[8]  arXiv:2303.11812 [pdf]
Title: Chinese Intermediate English Learners outdid ChatGPT in deep cohesion: Evidence from English narrative writing
Subjects: Computation and Language (cs.CL)

ChatGPT is a publicly available chatbot that can quickly generate texts on given topics, but it is unknown whether the chatbot is really superior to human writers in all aspects of writing and whether its writing quality can be prominently improved on the basis of updating commands. Consequently, this study compared the writing performance on a narrative topic by ChatGPT and Chinese intermediate English (CIE) learners so as to reveal the chatbot's advantage and disadvantage in writing. The data were analyzed in terms of five discourse components using Coh-Metrix (a special instrument for analyzing language discourses), and the results revealed that ChatGPT performed better than human writers in narrativity, word concreteness, and referential cohesion, but worse in syntactic simplicity and deep cohesion in its initial version. After more revision commands were updated, while the resulting version was facilitated in syntactic simplicity, yet it is still lagged far behind CIE learners' writing in deep cohesion. In addition, the correlation analysis of the discourse components suggests that narrativity was correlated with referential cohesion in both ChatGPT and human writers, but the correlations varied within each group.

[9]  arXiv:2303.12023 [pdf, other]
Title: Logical Reasoning over Natural Language as Knowledge Representation: A Survey
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)

Logical reasoning is central to human cognition and intelligence. Past research of logical reasoning within AI uses formal language as knowledge representation~(and symbolic reasoners). However, reasoning with formal language has proved challenging~(e.g., brittleness and knowledge-acquisition bottleneck). This paper provides a comprehensive overview on a new paradigm of logical reasoning, which uses natural language as knowledge representation~(and pretrained language models as reasoners), including philosophical definition and categorization of logical reasoning, advantages of the new paradigm, benchmarks and methods, challenges of the new paradigm, desirable tasks & methods in the future, and relation to related NLP fields. This new paradigm is promising since it not only alleviates many challenges of formal representation but also has advantages over end-to-end neural methods.

[10]  arXiv:2303.12024 [pdf, other]
Title: cTBL: Augmenting Large Language Models for Conversational Tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

An open challenge in multimodal conversational AI requires augmenting large language models with information from textual and non-textual sources for multi-turn dialogue. To address this problem, this paper introduces Conversational Tables (cTBL), a three-step encoder-decoder approach to retrieve tabular information and generate dialogue responses grounded on the retrieved information. cTBL uses Transformer encoder embeddings for Dense Table Retrieval and obtains up to 5% relative improvement in Top-1 and Top-3 accuracy over sparse retrieval on the HyrbiDialogue dataset. Additionally, cTBL performs tabular knowledge retrieval using both encoder and decoder models, resulting in up to 46% relative improvement in ROUGE scores and better human evaluation for response generation on HyrbiDialogue.

[11]  arXiv:2303.12029 [pdf, other]
Title: Wearing Masks Implies Refuting Trump?: Towards Target-specific User Stance Prediction across Events in COVID-19 and US Election 2020
Comments: 10 pages, 2 pages, WebSci 2023, April 30-May 1, 2023, Evanston, TX, USA
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)

People who share similar opinions towards controversial topics could form an echo chamber and may share similar political views toward other topics as well. The existence of such connections, which we call connected behavior, gives researchers a unique opportunity to predict how one would behave for a future event given their past behaviors. In this work, we propose a framework to conduct connected behavior analysis. Neural stance detection models are trained on Twitter data collected on three seemingly independent topics, i.e., wearing a mask, racial equality, and Trump, to detect people's stance, which we consider as their online behavior in each topic-related event. Our results reveal a strong connection between the stances toward the three topical events and demonstrate the power of past behaviors in predicting one's future behavior.

[12]  arXiv:2303.12038 [pdf]
Title: Grading Conversational Responses Of Chatbots
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Chatbots have long been capable of answering basic questions and even responding to obscure prompts, but recently their improvements have been far more significant. Modern chatbots like Open AIs ChatGPT3 not only have the ability to answer basic questions but can write code and movie scripts and imitate well-known people. In this paper, we analyze ChatGPTs' responses to various questions from a dataset of queries from the popular Quora forum. We submitted sixty questions to ChatGPT and scored the answers based on three industry-standard metrics for grading machine translation: BLEU, METEOR, and ROUGE. These metrics allow us to compare the machine responses with the most upvoted human answer to the same question to assess ChatGPT's ability to submit a humanistic reply. The results showed that while the responses and translation abilities of ChatGPT are remarkable, they still fall short of what a typical human reaction would be.

Cross-lists for Wed, 22 Mar 23

[13]  arXiv:2303.11366 (cross-list from cs.AI) [pdf, other]
Title: Reflexion: an autonomous agent with dynamic memory and self-reflection
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making processes, specifically the ability to learn from mistakes. Self-reflection allows humans to efficiently solve novel problems through a process of trial and error. Building on recent research, we propose Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment. To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.

[14]  arXiv:2303.11381 (cross-list from cs.CV) [pdf, other]
Title: MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

We propose MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action. In this paper, we define and explore a comprehensive list of advanced vision tasks that are intriguing to solve, but may exceed the capabilities of existing vision and vision-language models. To achieve such advanced visual intelligence, MM-REACT introduces a textual prompt design that can represent text descriptions, textualized spatial coordinates, and aligned file names for dense visual signals such as images and videos. MM-REACT's prompt design allows language models to accept, associate, and process multimodal information, thereby facilitating the synergetic combination of ChatGPT and various vision experts. Zero-shot experiments demonstrate MM-REACT's effectiveness in addressing the specified capabilities of interests and its wide application in different scenarios that require advanced visual understanding. Furthermore, we discuss and compare MM-REACT's system paradigm with an alternative approach that extends language models for multimodal scenarios through joint finetuning. Code, demo, video, and visualization are available at https://multimodal-react.github.io/

[15]  arXiv:2303.11403 (cross-list from cs.CV) [pdf, other]
Title: eP-ALM: Efficient Perceptual Augmentation of Language Models
Comments: Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

Large Language Models (LLMs) have so far impressed the world, with unprecedented capabilities that emerge in models at large scales. On the vision side, transformer models (i.e., ViT) are following the same trend, achieving the best performance on challenging benchmarks. With the abundance of such unimodal models, a natural question arises; do we need also to follow this trend to tackle multimodal tasks? In this work, we propose to rather direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. In particular, they still train a large number of parameters, rely on large multimodal pretraining, use encoders (e.g., CLIP) trained on huge image-text datasets, and add significant inference overhead. In addition, most of these approaches have focused on Zero-Shot and In Context Learning, with little to no effort on direct finetuning. We investigate the minimal computational effort needed to adapt unimodal models for multimodal tasks and propose a new challenging setup, alongside different approaches, that efficiently adapts unimodal pretrained models. We show that by freezing more than 99\% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning across Image, Video, and Audio modalities, following the proposed setup. The code will be available here: https://github.com/mshukor/eP-ALM.

[16]  arXiv:2303.11438 (cross-list from cs.AI) [pdf, ps, other]
Title: Minimizing Fuzzy Interpretations in Fuzzy Description Logics by Using Crisp Bisimulations
Authors: Linh Anh Nguyen
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The problem of minimizing finite fuzzy interpretations in fuzzy description logics (FDLs) is worth studying. For example, the structure of a fuzzy/weighted social network can be treated as a fuzzy interpretation in FDLs, where actors are individuals and actions are roles. Minimizing the structure of a fuzzy/weighted social network makes it more compact, thus making network analysis tasks more efficient. In this work, we study the problem of minimizing a finite fuzzy interpretation in a FDL by using the largest crisp auto-bisimulation. The considered FDLs use the Baaz projection operator and their semantics is specified using an abstract algebra of fuzzy truth values, which can be any linear and complete residuated lattice. We provide an efficient algorithm with a complexity of $O((m \log{l} + n) \log{n})$ for minimizing a given finite fuzzy interpretation $\mathcal{I}$, where $n$ is the size of the domain of $\mathcal{I}$, $m$ is number of nonzero instances of atomic roles of $\mathcal{I}$ and $l$ is the number of different fuzzy values used for instances of atomic roles of $\mathcal{I}$. We prove that the fuzzy interpretation returned by the algorithm is minimal among the ones that preserve fuzzy TBoxes and ABoxes under certain conditions.

[17]  arXiv:2303.11455 (cross-list from cs.SE) [pdf, other]
Title: Large Language Models and Simple, Stupid Bugs
Comments: Accepted at International Conference on Mining Software Repositories (MSR-2023)
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG)

With the advent of powerful neural language models, AI-based systems to assist developers in coding tasks are becoming widely available; Copilot is one such system. Copilot uses Codex, a large language model (LLM), to complete code conditioned on a preceding "prompt". Codex, however, is trained on public GitHub repositories, viz., on code that may include bugs and vulnerabilities. Previous studies [1], [2] show Codex reproduces vulnerabilities seen in training. In this study, we examine how prone Codex is to generate an interesting bug category, single statement bugs, commonly referred to as simple, stupid bugs or SStuBs in the MSR community. We find that Codex and similar LLMs do help avoid some SStuBs, but do produce known, verbatim SStuBs as much as 2x as likely than known, verbatim correct code. We explore the consequences of the Codex generated SStuBs and propose avoidance strategies that suggest the possibility of reducing the production of known, verbatim SStubs, and increase the possibility of producing known, verbatim fixes.

[18]  arXiv:2303.11525 (cross-list from cs.LG) [pdf, other]
Title: SIFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Recent works have explored the use of weight sparsity to improve the training efficiency (test accuracy w.r.t training FLOPs) of deep neural networks (DNNs). These works aim to reduce training FLOPs but training with sparse weights often leads to accuracy loss or requires longer train schedules, making the resulting training efficiency less clear. In contrast, we focus on using sparsity to increase accuracy while using the same FLOPS as the dense model and show training efficiency gains through higher accuracy. In this work, we introduce SIFT, a family of Sparse Iso-FLOP Transformations which are used as drop-in replacements for dense layers to improve their representational capacity and FLOP efficiency. Each transformation is parameterized by a single parameter (sparsity level) and provides a larger search space to find optimal sparse masks. Without changing any training hyperparameters, replacing dense layers with SIFT leads to significant improvements across computer vision (CV) and natural language processing (NLP) tasks, including ResNet-18 on ImageNet (+3.5%) and GPT-3 Small on WikiText-103 (-0.4 PPL), both matching larger dense model variants with 2x or more FLOPs. To the best of our knowledge, this is the first work to demonstrate the use of sparsity for improving accuracy of dense models via a simple-to-use set of sparse transformations. Code is available at: https://github.com/CerebrasResearch/SIFT.

[19]  arXiv:2303.11593 (cross-list from cs.LG) [pdf]
Title: Difficulty in learning chirality for Transformer fed with SMILES
Comments: 20 pages, 6 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Chemical Physics (physics.chem-ph); Biomolecules (q-bio.BM)

Recent years have seen development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. The results suggest that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low translation accuracy due to misunderstanding of enantiomers. These findings are expected to deepen understanding of NLP models in chemistry.

[20]  arXiv:2303.11648 (cross-list from cs.IR) [pdf, other]
Title: Improving Content Retrievability in Search with Controllable Query Generation
Comments: Accepted for publication in the International World Wide Web Conference 2023
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)

An important goal of online platforms is to enable content discovery, i.e. allow users to find a catalog entity they were not familiar with. A pre-requisite to discover an entity, e.g. a book, with a search engine is that the entity is retrievable, i.e. there are queries for which the system will surface such entity in the top results. However, machine-learned search engines have a high retrievability bias, where the majority of the queries return the same entities. This happens partly due to the predominance of narrow intent queries, where users create queries using the title of an already known entity, e.g. in book search 'harry potter'. The amount of broad queries where users want to discover new entities, e.g. in music search 'chill lyrical electronica with an atmospheric feeling to it', and have a higher tolerance to what they might find, is small in comparison. We focus here on two factors that have a negative impact on the retrievability of the entities (I) the training data used for dense retrieval models and (II) the distribution of narrow and broad intent queries issued in the system. We propose CtrlQGen, a method that generates queries for a chosen underlying intent-narrow or broad. We can use CtrlQGen to improve factor (I) by generating training data for dense retrieval models comprised of diverse synthetic queries. CtrlQGen can also be used to deal with factor (II) by suggesting queries with broader intents to users. Our results on datasets from the domains of music, podcasts, and books reveal that we can significantly decrease the retrievability bias of a dense retrieval model when using CtrlQGen. First, by using the generated queries as training data for dense models we make 9% of the entities retrievable (go from zero to non-zero retrievability). Second, by suggesting broader queries to users, we can make 12% of the entities retrievable in the best case.

[21]  arXiv:2303.11945 (cross-list from cs.SI) [pdf, other]
Title: Unsupervised Cross-Domain Rumor Detection with Contrastive Learning and Cross-Attention
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Massive rumors usually appear along with breaking news or trending topics, seriously hindering the truth. Existing rumor detection methods are mostly focused on the same domain, and thus have poor performance in cross-domain scenarios due to domain shift. In this work, we propose an end-to-end instance-wise and prototype-wise contrastive learning model with a cross-attention mechanism for cross-domain rumor detection. The model not only performs cross-domain feature alignment but also enforces target samples to align with the corresponding prototypes of a given source domain. Since target labels in a target domain are unavailable, we use a clustering-based approach with carefully initialized centers by a batch of source domain samples to produce pseudo labels. Moreover, we use a cross-attention mechanism on a pair of source data and target data with the same labels to learn domain-invariant representations. Because the samples in a domain pair tend to express similar semantic patterns, especially on the people's attitudes (e.g., supporting or denying) towards the same category of rumors, the discrepancy between a pair of the source domain and target domain will be decreased. We conduct experiments on four groups of cross-domain datasets and show that our proposed model achieves state-of-the-art performance.

[22]  arXiv:2303.12057 (cross-list from cs.CY) [pdf, other]
Title: Large Language Models Can Be Used to Estimate the Ideologies of Politicians in a Zero-Shot Learning Setting
Comments: 18 pages, 4 figures
Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)

The mass aggregation of knowledge embedded in large language models (LLMs) holds the promise of new solutions to problems of observability and measurement in the social sciences. We examine the utility of one such model for a particularly difficult measurement task: measuring the latent ideology of lawmakers, which allows us to better understand functions that are core to democracy, such as how politics shape policy and how political actors represent their constituents. We scale the senators of the 116th United States Congress along the liberal-conservative spectrum by prompting ChatGPT to select the more liberal (or conservative) senator in pairwise comparisons. We show that the LLM produced stable answers across repeated iterations, did not hallucinate, and was not simply regurgitating information from a single source. This new scale strongly correlates with pre-existing liberal-conservative scales such as NOMINATE, but also differs in several important ways, such as correctly placing senators who vote against their party for far-left or far-right ideological reasons on the extreme ends. The scale also highly correlates with ideological measures based on campaign giving and political activists' perceptions of these senators. In addition to the potential for better-automated data collection and information retrieval, our results suggest LLMs are likely to open new avenues for measuring latent constructs like ideology that rely on aggregating large quantities of data from public sources.

[23]  arXiv:2303.12060 (cross-list from cs.CV) [pdf, other]
Title: VideoXum: Cross-modal Visual and Textural Summarization of Videos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Video summarization aims to distill the most important information from a source video to produce either an abridged clip or a textual narrative. Traditionally, different methods have been proposed depending on whether the output is a video or text, thus ignoring the correlation between the two semantically related tasks of visual summarization and textual summarization. We propose a new joint video and text summarization task. The goal is to generate both a shortened video clip along with the corresponding textual summary from a long video, collectively referred to as a cross-modal summary. The generated shortened video clip and text narratives should be semantically well aligned. To this end, we first build a large-scale human-annotated dataset -- VideoXum (X refers to different modalities). The dataset is reannotated based on ActivityNet. After we filter out the videos that do not meet the length requirements, 14,001 long videos remain in our new dataset. Each video in our reannotated dataset has human-annotated video summaries and the corresponding narrative summaries. We then design a novel end-to-end model -- VTSUM-BILP to address the challenges of our proposed task. Moreover, we propose a new metric called VT-CLIPScore to help evaluate the semantic consistency of cross-modality summary. The proposed model achieves promising performance on this new task and establishes a benchmark for future research.

Replacements for Wed, 22 Mar 23

[24]  arXiv:2005.13316 (replaced) [pdf]
Title: Tracking, exploring and analyzing recent developments in German-language online press in the face of the coronavirus crisis: cOWIDplus Analysis and cOWIDplus Viewer
Comments: 13 pages, 6 figures, 1 table, 3852 words
Journal-ref: International Journal of Corpus Linguistics 2020, 25(3), pp. 347-359
Subjects: Computation and Language (cs.CL)
[25]  arXiv:2111.09543 (replaced) [pdf, other]
Title: DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Comments: 16 pages, 10 tables, 2 Figures. The DeBERTaV3 model significantly improves performance of the downstream NLU tasks over models with a similar structure, e.g. DeBERTaV3 large achieves 91.37% average GLUE score which is 1.37% over DeBERTa large. XSmall has only 22M backbone parameters, but significantly outperforms RoBERTa/XLNet-base. Paper is published as a conference paper at ICLR 2023
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[26]  arXiv:2203.03235 (replaced) [pdf, other]
Title: Pre-trained Token-replaced Detection Model as Few-shot Learner
Comments: Accepted to COLING 2022. The code is publicly available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[27]  arXiv:2204.04916 (replaced) [pdf, ps, other]
Title: A Token-level Contrastive Framework for Sign Language Translation
Comments: Accepted to ICASSP 2023
Subjects: Computation and Language (cs.CL)
[28]  arXiv:2205.12676 (replaced) [pdf, other]
Title: Evaluating Inclusivity, Equity, and Accessibility of NLP Technology: A Case Study for Indian Languages
Comments: Accepted to EACL Findings, 2023
Subjects: Computation and Language (cs.CL)
[29]  arXiv:2211.06552 (replaced) [pdf, other]
Title: Collecting Interactive Multi-modal Datasets for Grounded Language Understanding
Journal-ref: Interactive Learning for Natural Language Processing NeurIPS 2022 Workshop
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[30]  arXiv:2301.04347 (replaced) [pdf, other]
Title: Counteracts: Testing Stereotypical Representation in Pre-trained Language Models
Authors: Damin Zhang
Comments: FACCT; to be submitted to
Subjects: Computation and Language (cs.CL)
[31]  arXiv:2303.01593 (replaced) [pdf, other]
Title: QAID: Question Answering Inspired Few-shot Intent Detection
Comments: ICLR paper
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[32]  arXiv:2303.05382 (replaced) [pdf]
Title: ChatGPT Is on the Horizon: Could a Large Language Model Be All We Need for Intelligent Transportation?
Comments: Submitted to Nature - Machine Intelligence (13 Pages, 8 Figures)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[33]  arXiv:2303.08006 (replaced) [pdf, other]
Title: Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification
Comments: Accepted at ICRA 2023
Subjects: Computation and Language (cs.CL); Robotics (cs.RO)
[34]  arXiv:2303.09093 (replaced) [pdf, other]
Title: GLEN: General-Purpose Event Detection for Thousands of Types
Comments: The first two authors contributed equally. (15 pages, 11 figures)
Subjects: Computation and Language (cs.CL)
[35]  arXiv:2303.10475 (replaced) [pdf, other]
Title: Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning
Comments: Work is still in progress. The paper list is available at this https URL
Subjects: Computation and Language (cs.CL)
[36]  arXiv:2303.11141 (replaced) [pdf, other]
Title: DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset
Comments: Accepted by IEEE ICASSP 2023. The first two authors contribute equally
Subjects: Computation and Language (cs.CL)
[37]  arXiv:2206.06924 (replaced) [pdf, other]
Title: The Maximum Linear Arrangement Problem for trees under projectivity and planarity
Comments: The fourth version is incorrect. We are sure the right files were uploaded but for whatever reason, it looks like we uploaded the wrong files. The abstract in the fourth version is correct though
Subjects: Data Structures and Algorithms (cs.DS); Computation and Language (cs.CL); Discrete Mathematics (cs.DM)
[38]  arXiv:2210.09306 (replaced) [pdf, other]
Title: Mitigating Covertly Unsafe Text within Natural Language Systems
Comments: In Findings of the 2022 Conference on Empirical Methods in Natural Language Processing
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[39]  arXiv:2211.09699 (replaced) [pdf, other]
Title: PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[40]  arXiv:2211.09778 (replaced) [pdf, other]
Title: I Can't Believe There's No Images! Learning Visual Tasks Using only Language Data
Comments: website (this https URL), code (this https URL)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[41]  arXiv:2302.14115 (replaced) [pdf, other]
Title: Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Comments: CVPR 2023 Camera-Ready; Project Webpage: this https URL ; 18 pages; 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[ total of 41 entries: 1-41 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2303, contact, help  (Access key information)