We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: KQA Pro: A Large-Scale Dataset with Interpretable Programs and Accurate SPARQLs for Complex Question Answering over Knowledge Base

Abstract: Complex question answering over knowledge base (Complex KBQA) is challenging because it requires various compositional reasoning capabilities, such as multi-hop inference, attribute comparison, set operation, and etc. Existing benchmarks have some shortcomings that limit the development of Complex KBQA: 1) they only provide QA pairs without explicit reasoning processes; 2) questions are either generated by templates, leading to poor diversity, or on a small scale. To this end, we introduce KQA Pro, a large-scale dataset for Complex KBQA. We define a compositional and highly-interpretable formal format, named Program, to represent the reasoning process of complex questions. We propose compositional strategies to generate questions, corresponding SPARQLs, and Programs with a small number of templates, and then paraphrase the generated questions to natural language questions (NLQ) by crowdsourcing, giving rise to around 120K diverse instances. SPARQL and Program depict two complementary solutions to answer complex questions, which can benefit a large spectrum of QA methods. Besides the QA task, KQA Pro can also serves for the semantic parsing task. As far as we know, it is currently the largest corpus of NLQ-to-SPARQL and NLQ-to-Program. We conduct extensive experiments to evaluate whether machines can learn to answer our complex questions in different cases, that is, with only QA supervision or with intermediate SPARQL/Program supervision. We find that state-of-the-art KBQA methods learnt from only QA pairs perform very poor on our dataset, implying our questions are more challenging than previous datasets. However, pretrained models learnt from our NLQ-to-SPARQL and NLQ-to-Program annotations surprisingly achieve about 90\% answering accuracy, which is even close to the human expert performance...
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2007.03875 [cs.CL]
  (or arXiv:2007.03875v2 [cs.CL] for this version)

Submission history

From: Jiaxin Shi [view email]
[v1] Wed, 8 Jul 2020 03:28:04 GMT (658kb,D)
[v2] Tue, 22 Dec 2020 10:15:29 GMT (778kb,D)
[v3] Thu, 10 Mar 2022 14:28:57 GMT (1193kb,D)
[v4] Thu, 23 Jun 2022 09:23:52 GMT (1207kb,D)

Link back to: arXiv, form interface, contact.