We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Few-NERD: A Few-Shot Named Entity Recognition Dataset

Abstract: Recently, considerable literature has grown up around the theme of few-shot named entity recognition (NER), but little published benchmark data specifically focused on the practical and challenging task. Current approaches collect existing supervised NER datasets and re-organize them to the few-shot setting for empirical study. These strategies conventionally aim to recognize coarse-grained entity types with few examples, while in practice, most unseen entity types are fine-grained. In this paper, we present Few-NERD, a large-scale human-annotated few-shot NER dataset with a hierarchy of 8 coarse-grained and 66 fine-grained entity types. Few-NERD consists of 188,238 sentences from Wikipedia, 4,601,160 words are included and each is annotated as context or a part of a two-level entity type. To the best of our knowledge, this is the first few-shot NER dataset and the largest human-crafted NER dataset. We construct benchmark tasks with different emphases to comprehensively assess the generalization capability of models. Extensive empirical results and analysis show that Few-NERD is challenging and the problem requires further research. We make Few-NERD public at this https URL
Comments: Accepted by ACL-IJCNLP 2021 (long paper), update
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2105.07464 [cs.CL]
  (or arXiv:2105.07464v6 [cs.CL] for this version)

Submission history

From: Ning Ding [view email]
[v1] Sun, 16 May 2021 15:53:17 GMT (555kb,D)
[v2] Wed, 19 May 2021 08:50:14 GMT (556kb,D)
[v3] Mon, 31 May 2021 06:56:03 GMT (5981kb,D)
[v4] Wed, 2 Jun 2021 07:23:06 GMT (5981kb,D)
[v5] Sun, 20 Jun 2021 14:55:18 GMT (5981kb,D)
[v6] Wed, 1 Sep 2021 05:35:32 GMT (5981kb,D)

Link back to: arXiv, form interface, contact.