We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computation and Language

New submissions

[ total of 44 entries: 1-44 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 25 Jan 22

[1]  arXiv:2201.08860 [pdf, other]
Title: GreaseLM: Graph REASoning Enhanced Language Models for Question Answering
Comments: Published at ICLR 2022. All code, data, and pretrained models are available at this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it. However, pretrained language models (LM), the foundation of most modern QA systems, do not robustly represent latent relationships between concepts, which is necessary for reasoning. While knowledge graphs (KG) are often used to augment LMs with structured representations of world knowledge, it remains an open question how to effectively fuse and reason over the KG representations and the language context, which provides situational constraints and nuances. In this work, we propose GreaseLM, a new model that fuses encoded representations from pretrained LMs and graph neural networks over multiple layers of modality interaction operations. Information from both modalities propagates to the other, allowing language context representations to be grounded by structured world knowledge, and allowing linguistic nuances (e.g., negation, hedging) in the context to inform the graph representations of knowledge. Our results on three benchmarks in the commonsense reasoning (i.e., CommonsenseQA, OpenbookQA) and medical question answering (i.e., MedQA-USMLE) domains demonstrate that GreaseLM can more reliably answer questions that require reasoning over both situational constraints and structured knowledge, even outperforming models 8x larger.

[2]  arXiv:2201.08904 [pdf, other]
Title: Description-Driven Task-Oriented Dialog Modeling
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Task-oriented dialogue (TOD) systems are required to identify key information from conversations for the completion of given tasks. Such information is conventionally specified in terms of intents and slots contained in task-specific ontology or schemata. Since these schemata are designed by system developers, the naming convention for slots and intents is not uniform across tasks, and may not convey their semantics effectively. This can lead to models memorizing arbitrary patterns in data, resulting in suboptimal performance and generalization. In this paper, we propose that schemata should be modified by replacing names or notations entirely with natural language descriptions. We show that a language description-driven system exhibits better understanding of task specifications, higher performance on state tracking, improved data efficiency, and effective zero-shot transfer to unseen tasks. Following this paradigm, we present a simple yet effective Description-Driven Dialog State Tracking (D3ST) model, which relies purely on schema descriptions and an "index-picking" mechanism. We demonstrate the superiority in quality, data efficiency and robustness of our approach as measured on the MultiWOZ (Budzianowski et al.,2018), SGD (Rastogi et al., 2020), and the recent SGD-X (Lee et al., 2021) benchmarks.

[3]  arXiv:2201.08919 [pdf, other]
Title: Recurrent Neural Networks with Mixed Hierarchical Structures and EM Algorithm for Natural Language Processing
Comments: 9 pages, 5 figures
Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)

How to obtain hierarchical representations with an increasing level of abstraction becomes one of the key issues of learning with deep neural networks. A variety of RNN models have recently been proposed to incorporate both explicit and implicit hierarchical information in modeling languages in the literature. In this paper, we propose a novel approach called the latent indicator layer to identify and learn implicit hierarchical information (e.g., phrases), and further develop an EM algorithm to handle the latent indicator layer in training. The latent indicator layer further simplifies a text's hierarchical structure, which allows us to seamlessly integrate different levels of attention mechanisms into the structure. We called the resulting architecture as the EM-HRNN model. Furthermore, we develop two bootstrap strategies to effectively and efficiently train the EM-HRNN model on long text documents. Simulation studies and real data applications demonstrate that the EM-HRNN model with bootstrap training outperforms other RNN-based models in document classification tasks. The performance of the EM-HRNN model is comparable to a Transformer-based method called Bert-base, though the former is much smaller model and does not require pre-training.

[4]  arXiv:2201.08975 [pdf, other]
Title: Chinese Word Segmentation with Heterogeneous Graph Neural Network
Subjects: Computation and Language (cs.CL)

In recent years, deep learning has achieved significant success in the Chinese word segmentation (CWS) task. Most of these methods improve the performance of CWS by leveraging external information, e.g., words, sub-words, syntax. However, existing approaches fail to effectively integrate the multi-level linguistic information and also ignore the structural feature of the external information. Therefore, in this paper, we proposed a framework to improve CWS, named HGNSeg. It exploits multi-level external information sufficiently with the pre-trained language model and heterogeneous graph neural network. The experimental results on six benchmark datasets (e.g., Bakeoff 2005, Bakeoff 2008) validate that our approach can effectively improve the performance of Chinese word segmentation. Importantly, in cross-domain scenarios, our method also shows a strong ability to alleviate the out-of-vocabulary (OOV) problem.

[5]  arXiv:2201.09012 [pdf, other]
Title: Leaf: Multiple-Choice Question Generation
Comments: Accepted to ECIR 2022 (Demo)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Testing with quiz questions has proven to be an effective way to assess and improve the educational process. However, manually creating quizzes is tedious and time-consuming. To address this challenge, we present Leaf, a system for generating multiple-choice questions from factual text. In addition to being very well suited for the classroom, Leaf could also be used in an industrial setting, e.g., to facilitate onboarding and knowledge sharing, or as a component of chatbots, question answering systems, or Massive Open Online Courses (MOOCs). The code and the demo are available on https://github.com/KristiyanVachev/Leaf-Question-Generation.

[6]  arXiv:2201.09060 [pdf, ps, other]
Title: Solvability of orbit-finite systems of linear equations
Subjects: Computation and Language (cs.CL)

We study orbit-finite systems of linear equations, in the setting of sets with atoms. Our principal contribution is a decision procedure for solvability of such systems. The procedure works for every field (and even commutative ring) under mild effectiveness assumptions, and reduces a given orbit-finite system to a number of finite ones: exponentially many in general, but polynomially many when atom dimension of input systems is fixed. Towards obtaining the procedure we push further the theory of vector spaces generated by orbit-finite sets, and show that each such vector space admits an orbit-finite basis. This fundamental property is a key tool in our development, but should be also of wider interest.

[7]  arXiv:2201.09107 [pdf, other]
Title: Visual Information Guided Zero-Shot Paraphrase Generation
Authors: Zhe Lin, Xiaojun Wan
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Zero-shot paraphrase generation has drawn much attention as the large-scale high-quality paraphrase corpus is limited. Back-translation, also known as the pivot-based method, is typical to this end. Several works leverage different information as "pivot" such as language, semantic representation and so on. In this paper, we explore using visual information such as image as the "pivot" of back-translation. Different with the pipeline back-translation method, we propose visual information guided zero-shot paraphrase generation (ViPG) based only on paired image-caption data. It jointly trains an image captioning model and a paraphrasing model and leverage the image captioning model to guide the training of the paraphrasing model. Both automatic evaluation and human evaluation show our model can generate paraphrase with good relevancy, fluency and diversity, and image is a promising kind of pivot for zero-shot paraphrase generation.

[8]  arXiv:2201.09119 [pdf, other]
Title: A Causal Lens for Controllable Text Generation
Comments: NeurIPS 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)

Controllable text generation concerns two fundamental tasks of wide applications, namely generating text of given attributes (i.e., attribute-conditional generation), and minimally editing existing text to possess desired attributes (i.e., text attribute transfer). Extensive prior work has largely studied the two problems separately, and developed different conditional models which, however, are prone to producing biased text (e.g., various gender stereotypes). This paper proposes to formulate controllable text generation from a principled causal perspective which models the two tasks with a unified framework. A direct advantage of the causal formulation is the use of rich causality tools to mitigate generation biases and improve control. We treat the two tasks as interventional and counterfactual causal inference based on a structural causal model, respectively. We then apply the framework to the challenging practical setting where confounding factors (that induce spurious correlations) are observable only on a small fraction of data. Experiments show significant superiority of the causal approach over previous conditional models for improved control accuracy and reduced bias.

[9]  arXiv:2201.09146 [pdf, other]
Title: Question rewriting? Assessing its importance for conversational question answering
Comments: Submitted manuscript (without anonymized content) accepted to the 44th European Conference on Information Retrieval (ECIR) 2022. This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. The Version of Record of this contribution is published in [insert volume title], and is available online at this https URL[insert DOI]
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

In conversational question answering, systems must correctly interpret the interconnected interactions and generate knowledgeable answers, which may require the retrieval of relevant information from a background repository. Recent approaches to this problem leverage neural language models, although different alternatives can be considered in terms of modules for (a) representing user questions in context, (b) retrieving the relevant background information, and (c) generating the answer. This work presents a conversational question answering system designed specifically for the Search-Oriented Conversational AI (SCAI) shared task, and reports on a detailed analysis of its question rewriting module. In particular, we considered different variations of the question rewriting module to evaluate the influence on the subsequent components, and performed a careful analysis of the results obtained with the best system configuration. Our system achieved the best performance in the shared task and our analysis emphasizes the importance of the conversation context representation for the overall system performance.

[10]  arXiv:2201.09227 [pdf, ps, other]
Title: A Large and Diverse Arabic Corpus for Language Modeling
Authors: Abbas Raza Ali
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Language models (LMs) have introduced a major paradigm shift in Natural Language Processing (NLP) modeling where large pre-trained LMs became integral to most of the NLP tasks. The LMs are intelligent enough to find useful and relevant representations of the language without any supervision. Perhaps, these models are used to fine-tune typical NLP tasks with significantly high accuracy as compared to the traditional approaches. Conversely, the training of these models requires a massively large corpus that is a good representation of the language. English LMs generally perform better than their other language counterparts, due to the availability of massive English corpora.
This work elaborates on the design and development of a large Arabic corpus. It consists of over 500 GB of Arabic cleaned text targeted at improving cross-domain knowledge and downstream generalization capability of large-scale language models. Moreover, the corpus is utilized in the training of a large Arabic LM. In order to evaluate the effectiveness of the LM, a number of typical NLP tasks are fine-tuned. The tasks demonstrate a significant boost from 4.5 to 8.5% when compared to tasks fine-tuned on multi-lingual BERT (mBERT). To the best of my knowledge, this is currently the largest clean and diverse Arabic corpus ever collected.

[11]  arXiv:2201.09282 [pdf, other]
Title: WIDAR -- Weighted Input Document Augmented ROUGE
Comments: Manuscript Accepted as full paper in ECIR 2022
Subjects: Computation and Language (cs.CL)

The task of automatic text summarization has gained a lot of traction due to the recent advancements in machine learning techniques. However, evaluating the quality of a generated summary remains to be an open problem. The literature has widely adopted Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as the standard evaluation metric for summarization. However, ROUGE has some long-established limitations; a major one being its dependence on the availability of good quality reference summary. In this work, we propose the metric WIDAR which in addition to utilizing the reference summary uses also the input document in order to evaluate the quality of the generated summary. The proposed metric is versatile, since it is designed to adapt the evaluation score according to the quality of the reference summary. The proposed metric correlates better than ROUGE by 26%, 76%, 82%, and 15%, respectively, in coherence, consistency, fluency, and relevance on human judgement scores provided in the SummEval dataset. The proposed metric is able to obtain comparable results with other state-of-the-art metrics while requiring a relatively short computational time.

[12]  arXiv:2201.09324 [pdf, other]
Title: Supervised Visual Attention for Simultaneous Multimodal Machine Translation
Comments: Journal article under review
Subjects: Computation and Language (cs.CL)

Recently, there has been a surge in research in multimodal machine translation (MMT), where additional modalities such as images are used to improve translation quality of textual systems. A particular use for such multimodal systems is the task of simultaneous machine translation, where visual context has been shown to complement the partial information provided by the source sentence, especially in the early phases of translation (Caglayanet al., 2020a; Imankulova et al., 2020). In this paper, we propose the first Transformer-based simultaneous MMT architecture, which has not been previously explored in the field. Additionally, we extend this model with an auxiliary supervision signal that guides its visual attention mechanism using labelled phrase-region alignments. We perform comprehensive experiments on three language directions and conduct thorough quantitative and qualitative analyses using both automatic metrics and manual inspection. Our results show that (i) supervised visual attention consistently improves the translation quality of the MMT models, and (ii) fine-tuning the MMT with supervision loss enabled leads to better performance than training the MMT from scratch. Compared to the state-of-the-art, our proposed model achieves improvements of up to 2.3 BLEU and 3.5 METEOR points.

[13]  arXiv:2201.09377 [pdf, other]
Title: An Application of Pseudo-Log-Likelihoods to Natural Language Scoring
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Language models built using semi-supervised machine learning on large corpora of natural language have very quickly enveloped the fields of natural language generation and understanding. In this paper we apply a zero-shot approach independently developed by a number of researchers now gaining recognition as a significant alternative to fine-tuning for evaluation on common sense tasks. A language model with relatively few parameters and training steps compared to a more recent language model (T5) can outperform it on a recent large data set (TimeDial), while displaying robustness in its performance across a similar class of language tasks. Surprisingly, this result is achieved by using a hyperparameter-free zero-shot method with the smaller model, compared to fine-tuning to the larger model. We argue that robustness of the smaller model ought to be understood in terms of compositionality, in a sense that we draw from recent literature on a class of similar models. We identify a practical cost for our method and model: high GPU-time for natural language evaluation. The zero-shot measurement technique that produces remarkable stability, both for ALBERT and other BERT variants, is an application of pseudo-log-likelihoods to masked language models for the relative measurement of probability for substitution alternatives in forced choice language tasks such as the Winograd Schema Challenge, Winogrande, and others. One contribution of this paper is to bring together a number of similar, but independent strands of research. We produce some absolute state-of-the-art results for common sense reasoning in binary choice tasks, performing better than any published result in the literature, including fine-tuned efforts. We show a remarkable consistency of the model's performance under adversarial settings, which we argue is best explained by the model's compositionality of representations.

[14]  arXiv:2201.09518 [pdf]
Title: Synthetic Books
Comments: 7 pages, 5 figures
Journal-ref: ARTECH 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The article explores new ways of written language aided by AI technologies, like GPT-2 and GPT-3. The question that is stated in the paper is not about whether these novel technologies will eventually replace authored books, but how to relate to and contextualize such publications and what kind of new tools, processes, and ideas are behind them. For that purpose, a new concept of synthetic books is introduced in the article. It stands for the publications created by deploying AI technology, more precisely autoregressive language models that are able to generate human-like text. Supported by the case studies, the value and reasoning of the synthetic books are discussed. The paper emphasizes that artistic quality is an issue when it comes to AI-generated content. The article introduces projects that demonstrate an interactive input by an artist and/or audience combined with the deep-learning-based language models. In the end, the paper focuses on understanding the neural aesthetics of written language in the art context.

[15]  arXiv:2201.09523 [pdf, other]
Title: BTPK-based learning: An Interpretable Method for Named Entity Recognition
Comments: 7 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Named entity recognition (NER) is an essential task in natural language processing, but the internal mechanism of most NER models is a black box for users. In some high-stake decision-making areas, improving the interpretability of an NER method is crucial but challenging. In this paper, based on the existing Deterministic Talmudic Public announcement logic (TPK) model, we propose a novel binary tree model (called BTPK) and apply it to two widely used Bi-RNNs to obtain BTPK-based interpretable ones. Then, we design a counterfactual verification module to verify the BTPK-based learning method. Experimental results on three public datasets show that the BTPK-based learning outperform two classical Bi-RNNs with self-attention, especially on small, simple data and relatively large, complex data. Moreover, the counterfactual verification demonstrates that the explanations provided by the BTPK-based learning method are reasonable and accurate in NER tasks. Besides, the logical reasoning based on BTPK shows how Bi-RNNs handle NER tasks, with different distance of public announcements on long and complex sequences.

[16]  arXiv:2201.09651 [pdf, other]
Title: Artefact Retrieval: Overview of NLP Models with Knowledge Base Access
Comments: 11 pages of main content, 7 pages of appendix; presented at AKBC CSRR 2021
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Many NLP models gain performance by having access to a knowledge base. A lot of research has been devoted to devising and improving the way the knowledge base is accessed and incorporated into the model, resulting in a number of mechanisms and pipelines. Despite the diversity of proposed mechanisms, there are patterns in the designs of such systems. In this paper, we systematically describe the typology of artefacts (items retrieved from a knowledge base), retrieval mechanisms and the way these artefacts are fused into the model. This further allows us to uncover combinations of design decisions that had not yet been tried. Most of the focus is given to language models, though we also show how question answering, fact-checking and knowledgable dialogue models fit into this system as well. Having an abstract model which can describe the architecture of specific models also helps with transferring these architectures between multiple NLP tasks.

[17]  arXiv:2201.09680 [pdf, other]
Title: Relational Memory Augmented Language Models
Comments: Accepted to TACL, pre MIT Press publication version
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present a memory-augmented approach to condition an autoregressive language model on a knowledge graph. We represent the graph as a collection of relation triples and retrieve relevant relations for a given context to improve text generation. Experiments on WikiText-103, WMT19, and enwik8 English datasets demonstrate that our approach produces a better language model in terms of perplexity and bits per character. We also show that relational memory improves coherence, is complementary to token-based memory, and enables causal interventions. Our model provides a simple yet effective way to combine an autoregressive language model with a knowledge graph for a more coherent and logical generation.

[18]  arXiv:2201.09696 [pdf, other]
Title: Unified Question Generation with Continual Lifelong Learning
Comments: Paper accepted in The Web Conference (WWW) 2022
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Question Generation (QG), as a challenging Natural Language Processing task, aims at generating questions based on given answers and context. Existing QG methods mainly focus on building or training models for specific QG datasets. These works are subject to two major limitations: (1) They are dedicated to specific QG formats (e.g., answer-extraction or multi-choice QG), therefore, if we want to address a new format of QG, a re-design of the QG model is required. (2) Optimal performance is only achieved on the dataset they were just trained on. As a result, we have to train and keep various QG models for different QG datasets, which is resource-intensive and ungeneralizable.
To solve the problems, we propose a model named Unified-QG based on lifelong learning techniques, which can continually learn QG tasks across different datasets and formats. Specifically, we first build a format-convert encoding to transform different kinds of QG formats into a unified representation. Then, a method named \emph{STRIDER} (\emph{S}imilari\emph{T}y \emph{R}egular\emph{I}zed \emph{D}ifficult \emph{E}xample \emph{R}eplay) is built to alleviate catastrophic forgetting in continual QG learning. Extensive experiments were conducted on $8$ QG datasets across $4$ QG formats (answer-extraction, answer-abstraction, multi-choice, and boolean QG) to demonstrate the effectiveness of our approach. Experimental results demonstrate that our Unified-QG can effectively and continually adapt to QG tasks when datasets and formats vary. In addition, we verify the ability of a single trained Unified-QG model in improving $8$ Question Answering (QA) systems' performance through generating synthetic QA data.

[19]  arXiv:2201.09745 [pdf, other]
Title: Table Pretraining: A Survey on Model Architectures, Pretraining Objectives, and Downstream Tasks
Comments: Work in progress
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs, and various other document types, a flurry of table pretraining frameworks have been proposed following the success of text and images, and they have achieved new state-of-the-arts on various tasks such as table question answering, table type recognition, column relation classification, table search, formula prediction, etc. To fully use the supervision signals in unlabeled tables, a variety of pretraining objectives have been designed and evaluated, for example, denoising cell values, predicting numerical relationships, and implicitly executing SQLs. And to best leverage the characteristics of (semi-)structured tables, various tabular language models, particularly with specially-designed attention mechanisms, have been explored. Since tables usually appear and interact with free-form text, table pretraining usually takes the form of table-text joint pretraining, which attracts significant research interests from multiple domains. This survey aims to provide a comprehensive review of different model designs, pretraining objectives, and downstream tasks for table pretraining, and we share our thoughts and vision on existing challenges and future opportunities.

Cross-lists for Tue, 25 Jan 22

[20]  arXiv:2201.09451 (cross-list from cs.SI) [pdf, other]
Title: Emotion-based Modeling of Mental Disorders on Social Media
Comments: Proceedings of the 20th IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)

According to the World Health Organization (WHO), one in four people will be affected by mental disorders at some point in their lives. However, in many parts of the world, patients do not actively seek professional diagnosis because of stigma attached to mental illness, ignorance of mental health and its associated symptoms. In this paper, we propose a model for passively detecting mental disorders using conversations on Reddit. Specifically, we focus on a subset of mental disorders that are characterized by distinct emotional patterns (henceforth called emotional disorders): major depressive, anxiety, and bipolar disorders. Through passive (i.e., unprompted) detection, we can encourage patients to seek diagnosis and treatment for mental disorders. Our proposed model is different from other work in this area in that our model is based entirely on the emotional states, and the transition between these states of users on Reddit, whereas prior work is typically based on content-based representations (e.g., n-grams, language model embeddings, etc). We show that content-based representation is affected by domain and topic bias and thus does not generalize, while our model, on the other hand, suppresses topic-specific information and thus generalizes well across different topics and times. We conduct experiments on our model's ability to detect different emotional disorders and on the generalizability of our model. Our experiments show that while our model performs comparably to content-based models, such as BERT, it generalizes much better across time and topic.

[21]  arXiv:2201.09486 (cross-list from cs.SD) [pdf, other]
Title: Bias in Automated Speaker Recognition
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Automated speaker recognition uses data processing to identify speakers by their voice. Today, automated speaker recognition technologies are deployed on billions of smart devices and in services such as call centres. Despite their wide-scale deployment and known sources of bias in face recognition and natural language processing, bias in automated speaker recognition has not been studied systematically. We present an in-depth empirical and analytical study of bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. Drawing on an established framework for understanding sources of harm in machine learning, we show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge, including model building, implementation, and data generation. Most affected are female speakers and non-US nationalities, who experience significant performance degradation. Leveraging the insights from our findings, we make practical recommendations for mitigating bias in automated speaker recognition, and outline future research directions.

[22]  arXiv:2201.09494 (cross-list from eess.AS) [pdf, other]
Title: Data and knowledge-driven approaches for multilingual training to improve the performance of speech recognition systems of Indian languages
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

We propose data and knowledge-driven approaches for multilingual training of the automated speech recognition (ASR) system for a target language by pooling speech data from multiple source languages. Exploiting the acoustic similarities between Indian languages, we implement two approaches. In phone/senone mapping, deep neural network (DNN) learns to map senones or phones from one language to the others, and the transcriptions of the source languages are modified such that they can be used along with the target language data to train and fine-tune the target language ASR system. In the other approach, we model the acoustic information for all the languages simultaneously by training a multitask DNN (MTDNN) to predict the senones of each language in different output layers. The cross-entropy loss and the weight update procedure are modified such that only the shared layers and the output layer responsible for predicting the senone classes of a language are updated during training, if the feature vector belongs to that particular language. In the low-resource setting (LRS), 40 hours of transcribed data each for Tamil, Telugu and Gujarati languages are used for training. The DNN based senone mapping technique gives relative improvements in word error rates (WER) of 9.66%, 7.2% and 15.21% over the baseline system for Tamil, Gujarati and Telugu languages, respectively. In medium-resourced setting (MRS), 160, 275 and 135 hours of data for Tamil, Kannada and Hindi languages are used, where, the same technique gives better relative improvements of 13.94%, 10.28% and 27.24% for Tamil, Kannada and Hindi, respectively. The MTDNN with senone mapping based training in LRS, gives higher relative WER improvements of 15.0%, 17.54% and 16.06%, respectively for Tamil, Gujarati and Telugu, whereas in MRS, we see improvements of 21.24% 21.05% and 30.17% for Tamil, Kannada and Hindi languages, respectively.

[23]  arXiv:2201.09555 (cross-list from cs.AI) [pdf, other]
Title: A Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Digital Libraries (cs.DL)

Scholarly data is growing continuously containing information about the articles from plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the for of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also lead to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: 1) Multimodal KGEs, 2) A blocking procedure, and finally, 3) Hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8-14\% in terms of F$_1$ score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://zenodo.org/record/5675787\#.YcCJzL3MJTY) respectively.

[24]  arXiv:2201.09586 (cross-list from eess.AS) [pdf, other]
Title: PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Comments: 5 pages, 2 figure, 2 tables, accepted for presentation at ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

This paper proposes PickNet, a neural network model for real-time channel selection for an ad hoc microphone array consisting of multiple recording devices like cell phones. Assuming at most one person to be vocally active at each time point, PickNet identifies the device that is spatially closest to the active person for each time frame by using a short spectral patch of just hundreds of milliseconds. The model is applied to every time frame, and the short time frame signals from the selected microphones are concatenated across the frames to produce an output signal. As the personal devices are usually held close to their owners, the output signal is expected to have higher signal-to-noise and direct-to-reverberation ratios on average than the input signals. Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time and is robust to changes in acoustic conditions. Speech recognition-based evaluation was carried out by using real conversational recordings obtained with various smartphones. The proposed model yielded significant gains in word error rate with limited computational cost over systems using a block-online beamformer and a single distant microphone.

[25]  arXiv:2201.09708 (cross-list from cs.AI) [pdf, other]
Title: Towards Collaborative Question Answering: A Preliminary Study
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Knowledge and expertise in the real-world can be disjointedly owned. To solve a complex question, collaboration among experts is often called for. In this paper, we propose CollabQA, a novel QA task in which several expert agents coordinated by a moderator work together to answer questions that cannot be answered with any single agent alone. We make a synthetic dataset of a large knowledge graph that can be distributed to experts. We define the process to form a complex question from ground truth reasoning path, neural network agent models that can learn to solve the task, and evaluation metrics to check the performance. We show that the problem can be challenging without introducing prior of the collaboration structure, unless experts are perfect and uniform. Based on this experience, we elaborate extensions needed to approach collaboration tasks in real-world settings.

Replacements for Tue, 25 Jan 22

[26]  arXiv:1703.08748 (replaced) [pdf]
Title: LEPOR: An Augmented Machine Translation Evaluation Metric
Authors: Lifeng Han
Comments: 114 pages, thesis
Journal-ref: University of Macau Library. MSc. Thesis. 2014. https://library2.um.edu.mo/etheses/b33358400_ft.pdf
Subjects: Computation and Language (cs.CL)
[27]  arXiv:2010.04389 (replaced) [pdf, other]
Title: A Survey of Knowledge-Enhanced Text Generation
Comments: Accepted by ACM Computing Survey (CSUR) in Jan 2022
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[28]  arXiv:2104.07358 (replaced) [pdf, other]
Title: Adaptive Sparse Transformer for Multilingual Translation
Subjects: Computation and Language (cs.CL)
[29]  arXiv:2105.03458 (replaced) [pdf, other]
Title: Duplex Sequence-to-Sequence Learning for Reversible Machine Translation
Comments: NeurIPS 2021 camera-ready
Subjects: Computation and Language (cs.CL)
[30]  arXiv:2109.06379 (replaced) [pdf, other]
Title: Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
Comments: EMNLP 2021, Code available at this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[31]  arXiv:2109.14927 (replaced) [src]
Title: BERT got a Date: Introducing Transformers to Temporal Tagging
Comments: unreliable evaluation results for Seq2seq models
Subjects: Computation and Language (cs.CL)
[32]  arXiv:2110.01900 (replaced) [pdf, other]
Title: DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
Comments: Accepted to ICASSP 2022
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[33]  arXiv:2110.07483 (replaced) [pdf, other]
Title: On the Pitfalls of Analyzing Individual Neurons in Language Models
Comments: Accepted to ICLR 2022
Subjects: Computation and Language (cs.CL)
[34]  arXiv:2110.08743 (replaced) [pdf, other]
Title: GNN-LM: Language Modeling based on Global Contexts via GNN
Comments: To appear at ICLR 2022
Subjects: Computation and Language (cs.CL)
[35]  arXiv:2110.09131 (replaced) [pdf, other]
Title: Ensembling Graph Predictions for AMR Parsing
Comments: Published at NeurIPS 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[36]  arXiv:2110.12679 (replaced) [pdf, other]
Title: Improving Embedded Knowledge Graph Multi-hop Question Answering by introducing Relational Chain Reasoning
Comments: 37 pages, 5 figures; about 40 references
Subjects: Computation and Language (cs.CL)
[37]  arXiv:2110.13900 (replaced) [pdf, other]
Title: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38]  arXiv:2112.11202 (replaced) [pdf, other]
Title: Contrast and Generation Make BART a Good Dialogue Emotion Recognizer
Comments: Accepted by AAAI 2022
Subjects: Computation and Language (cs.CL)
[39]  arXiv:2201.00318 (replaced) [pdf, other]
Title: On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations
Comments: Accepted at Computing Conference 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[40]  arXiv:2201.03335 (replaced) [pdf, other]
Title: DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population
Comments: work in progress
Subjects: Computation and Language (cs.CL)
[41]  arXiv:2201.05878 (replaced) [pdf, other]
Title: Automatic Lexical Simplification for Turkish
Subjects: Computation and Language (cs.CL)
[42]  arXiv:2201.08239 (replaced) [pdf, other]
[43]  arXiv:2106.13948 (replaced) [pdf, other]
Title: Core Challenges in Embodied Vision-Language Planning
Comments: 42 pages, plus references
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[44]  arXiv:2110.03151 (replaced) [pdf, other]
Title: Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR
Comments: To appear in ICASSP 2022; System labels (SC and VBx) in Table 1 have been fixed
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[ total of 44 entries: 1-44 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2201, contact, help  (Access key information)