Named Entity Recognition in the Legal Domain using a Pointer Generator Network

Skylaki, Stavroula; Oskooei, Ali; Bari, Omar; Herger, Nadja; Kriegman, Zac

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2012

Computer Science > Computation and Language

Title: Named Entity Recognition in the Legal Domain using a Pointer Generator Network

Authors: Stavroula Skylaki, Ali Oskooei, Omar Bari, Nadja Herger, Zac Kriegman (Thomson Reuters Labs)

(Submitted on 17 Dec 2020)

Abstract: Named Entity Recognition (NER) is the task of identifying and classifying named entities in unstructured text. In the legal domain, named entities of interest may include the case parties, judges, names of courts, case numbers, references to laws etc. We study the problem of legal NER with noisy text extracted from PDF files of filed court cases from US courts. The "gold standard" training data for NER systems provide annotation for each token of the text with the corresponding entity or non-entity label. We work with only partially complete training data, which differ from the gold standard NER data in that the exact location of the entities in the text is unknown and the entities may contain typos and/or OCR mistakes. To overcome the challenges of our noisy training data, e.g. text extraction errors and/or typos and unknown label indices, we formulate the NER task as a text-to-text sequence generation task and train a pointer generator network to generate the entities in the document rather than label them. We show that the pointer generator can be effective for NER in the absence of gold standard data and outperforms the common NER neural network architectures in long legal documents.

Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2012.09936 [cs.CL]
	(or arXiv:2012.09936v1 [cs.CL] for this version)

Submission history

From: Stavroula Skylaki [view email]
[v1] Thu, 17 Dec 2020 21:10:34 GMT (481kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2012.09936

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Named Entity Recognition in the Legal Domain using a Pointer Generator Network

Submission history