Predicting Document Coverage for Relation Extraction

Singhania, Sneha; Razniewski, Simon; Weikum, Gerhard

Full-text links:

Download:

Computer Science > Computation and Language

Title: Predicting Document Coverage for Relation Extraction

Authors: Sneha Singhania, Simon Razniewski, Gerhard Weikum

(Submitted on 26 Nov 2021)

Abstract: This paper presents a new task of predicting the coverage of a text document for relation extraction (RE): does the document contain many relational tuples for a given entity? Coverage predictions are useful in selecting the best documents for knowledge base construction with large input corpora. To study this problem, we present a dataset of 31,366 diverse documents for 520 entities. We analyze the correlation of document coverage with features like length, entity mention frequency, Alexa rank, language complexity and information retrieval scores. Each of these features has only moderate predictive power. We employ methods combining features with statistical models like TF-IDF and language models like BERT. The model combining features and BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of coverage predictions on two use cases: KB construction and claim refutation.

Comments:	To appear in TACL. The arXiv version is a pre-MIT Press publication version
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2111.13611 [cs.CL]
	(or arXiv:2111.13611v1 [cs.CL] for this version)

Submission history

From: Sneha Singhania [view email]
[v1] Fri, 26 Nov 2021 17:18:18 GMT (414kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2111.13611

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Predicting Document Coverage for Relation Extraction

Submission history