We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Ditch the Gold Standard: Re-evaluating Conversational Question Answering

Abstract: Conversational question answering aims to provide natural-language answers to users in information-seeking conversations. Existing conversational QA benchmarks compare models with pre-collected human-human conversations, using ground-truth answers provided in conversational history. It remains unclear whether we can rely on this static evaluation for model development and whether current systems can well generalize to real-world human-machine conversations. In this work, we conduct the first large-scale human evaluation of state-of-the-art conversational QA systems, where human evaluators converse with models and judge the correctness of their answers. We find that the distribution of human machine conversations differs drastically from that of human-human conversations, and there is a disagreement between human and gold-history evaluation in terms of model ranking. We further investigate how to improve automatic evaluations, and propose a question rewriting mechanism based on predicted history, which better correlates with human judgments. Finally, we analyze the impact of various modeling strategies and discuss future directions towards building better conversational question answering systems.
Comments: Accepted to ACL 2022; The dataset and code are available at this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2112.08812 [cs.CL]
  (or arXiv:2112.08812v2 [cs.CL] for this version)

Submission history

From: Huihan Li [view email]
[v1] Thu, 16 Dec 2021 11:57:56 GMT (8675kb,D)
[v2] Mon, 21 Mar 2022 20:59:47 GMT (6145kb,D)

Link back to: arXiv, form interface, contact.