References & Citations
Computer Science > Computation and Language
Title: KQA Pro: A Large-Scale Dataset with Interpretable Programs and Accurate SPARQLs for Complex Question Answering over Knowledge Base
(Submitted on 8 Jul 2020 (v1), revised 22 Dec 2020 (this version, v2), latest version 23 Jun 2022 (v4))
Abstract: Complex question answering over knowledge base (Complex KBQA) is challenging because it requires various compositional reasoning capabilities, such as multi-hop inference, attribute comparison, set operation, and etc. Existing benchmarks have some shortcomings that limit the development of Complex KBQA: 1) they only provide QA pairs without explicit reasoning processes; 2) questions are either generated by templates, leading to poor diversity, or on a small scale. To this end, we introduce KQA Pro, a large-scale dataset for Complex KBQA. We define a compositional and highly-interpretable formal format, named Program, to represent the reasoning process of complex questions. We propose compositional strategies to generate questions, corresponding SPARQLs, and Programs with a small number of templates, and then paraphrase the generated questions to natural language questions (NLQ) by crowdsourcing, giving rise to around 120K diverse instances. SPARQL and Program depict two complementary solutions to answer complex questions, which can benefit a large spectrum of QA methods. Besides the QA task, KQA Pro can also serves for the semantic parsing task. As far as we know, it is currently the largest corpus of NLQ-to-SPARQL and NLQ-to-Program. We conduct extensive experiments to evaluate whether machines can learn to answer our complex questions in different cases, that is, with only QA supervision or with intermediate SPARQL/Program supervision. We find that state-of-the-art KBQA methods learnt from only QA pairs perform very poor on our dataset, implying our questions are more challenging than previous datasets. However, pretrained models learnt from our NLQ-to-SPARQL and NLQ-to-Program annotations surprisingly achieve about 90\% answering accuracy, which is even close to the human expert performance...
Submission history
From: Jiaxin Shi [view email][v1] Wed, 8 Jul 2020 03:28:04 GMT (658kb,D)
[v2] Tue, 22 Dec 2020 10:15:29 GMT (778kb,D)
[v3] Thu, 10 Mar 2022 14:28:57 GMT (1193kb,D)
[v4] Thu, 23 Jun 2022 09:23:52 GMT (1207kb,D)
Link back to: arXiv, form interface, contact.