We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Abstract: Web search is an essential way for humans to obtain information, but it's still a great challenge for machines to understand the contents of web pages. In this paper, we introduce the task of structural reading comprehension (SRC) on web. Given a web page and a question about it, the task is to find the answer from the web page. This task requires a system not only to understand the semantics of texts but also the structure of the web page. Moreover, we proposed WebSRC, a novel Web-based Structural Reading Comprehension dataset. WebSRC consists of 400K question-answer pairs, which are collected from 6.4K web pages. Along with the QA pairs, corresponding HTML source code, screenshots, and metadata are also provided in our dataset. Each question in WebSRC requires a certain structural understanding of a web page to answer, and the answer is either a text span on the web page or yes/no. We evaluate various baselines on our dataset to show the difficulty of our task. We also investigate the usefulness of structural information and visual features. Our dataset and baselines have been publicly available at this https URL
Comments: EMNLP 2021
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2101.09465 [cs.CL]
  (or arXiv:2101.09465v2 [cs.CL] for this version)

Submission history

From: Lu Chen [view email]
[v1] Sat, 23 Jan 2021 09:43:44 GMT (6615kb,D)
[v2] Mon, 8 Nov 2021 08:31:44 GMT (4827kb,D)

Link back to: arXiv, form interface, contact.