We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CY

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computers and Society

Title: Should I disclose my dataset? Caveats between reproducibility and individual data rights

Abstract: Natural language processing techniques have helped domain experts solve legal problems. Digital availability of court documents increases possibilities for researchers, who can access them as a source for building datasets -- whose disclosure is aligned with good reproducibility practices in computational research. Large and digitized court systems, such as the Brazilian one, are prone to be explored in that sense. However, personal data protection laws impose restrictions on data exposure and state principles about which researchers should be mindful. Special caution must be taken in cases with human rights violations, such as gender discrimination, over which we elaborate as an example of interest. We present legal and ethical considerations on the issue, as well as guidelines for researchers dealing with this kind of data and deciding whether to disclose it.
Comments: 10 pages, 2 figures. To be published in the 4th Workshop on Natural Legal Language Processing (NLLP 2022), co-located with the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
Subjects: Computers and Society (cs.CY)
ACM classes: K.4.1; K.5.0
Cite as: arXiv:2211.00498 [cs.CY]
  (or arXiv:2211.00498v1 [cs.CY] for this version)

Submission history

From: Raysa Masson Benatti [view email]
[v1] Tue, 1 Nov 2022 14:42:11 GMT (7062kb,D)

Link back to: arXiv, form interface, contact.