We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: CDJUR-BR -- A Golden Collection of Legal Document from Brazilian Justice with Fine-Grained Named Entities

Abstract: A basic task for most Legal Artificial Intelligence (Legal AI) applications is Named Entity Recognition (NER). However, texts produced in the context of legal practice make references to entities that are not trivially recognized by the currently available NERs. There is a lack of categorization of legislation, jurisprudence, evidence, penalties, the roles of people in a legal process (judge, lawyer, victim, defendant, witness), types of locations (crime location, defendant's address), etc. In this sense, there is still a need for a robust golden collection, annotated with fine-grained entities of the legal domain, and which covers various documents of a legal process, such as petitions, inquiries, complaints, decisions and sentences. In this article, we describe the development of the Golden Collection of the Brazilian Judiciary (CDJUR-BR) contemplating a set of fine-grained named entities that have been annotated by experts in legal documents. The creation of CDJUR-BR followed its own methodology that aimed to attribute a character of comprehensiveness and robustness. Together with the CDJUR-BR repository we provided a NER based on the BERT model and trained with the CDJUR-BR, whose results indicated the prevalence of the CDJUR-BR.
Comments: 15 pages, in Portuguese language, 3 figures, 5 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as: arXiv:2305.18315 [cs.CL]
  (or arXiv:2305.18315v1 [cs.CL] for this version)

Submission history

From: Antonio Mauricio Brito Junior [view email]
[v1] Sat, 20 May 2023 00:48:52 GMT (1443kb,D)

Link back to: arXiv, form interface, contact.