WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections

Chen, Mingda; Wiseman, Sam; Gimpel, Kevin

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2012

Change to browse by:

Computer Science > Computation and Language

Title: WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections

Authors: Mingda Chen, Sam Wiseman, Kevin Gimpel

(Submitted on 29 Dec 2020 (v1), last revised 2 Jun 2021 (this version, v2))

Abstract: Datasets for data-to-text generation typically focus either on multi-domain, single-sentence generation or on single-domain, long-form generation. In this work, we cast generating Wikipedia sections as a data-to-text generation task and create a large-scale dataset, WikiTableT, that pairs Wikipedia sections with their corresponding tabular data and various metadata. WikiTableT contains millions of instances, covering a broad range of topics, as well as a variety of flavors of generation tasks with different levels of flexibility. We benchmark several training and decoding strategies on WikiTableT. Our qualitative analysis shows that the best approaches can generate fluent and high quality texts but they struggle with coherence and factuality, showing the potential for our dataset to inspire future work on long-form generation.

Comments:	Findings of ACL 2021, camera-ready version
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2012.14919 [cs.CL]
	(or arXiv:2012.14919v2 [cs.CL] for this version)

Submission history

From: Mingda Chen [view email]
[v1] Tue, 29 Dec 2020 19:35:34 GMT (7758kb,D)
[v2] Wed, 2 Jun 2021 00:42:42 GMT (347kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2012.14919

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections

Submission history