Towards Lexical Gender Inference: A Scalable Methodology using Online Databases

Bartl, Marion; Leavy, Susan

doi:10.18653/v1/2022.ltedi-1.7

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2206

Change to browse by:

Computer Science > Computation and Language

Title: Towards Lexical Gender Inference: A Scalable Methodology using Online Databases

Authors: Marion Bartl, Susan Leavy

(Submitted on 28 Jun 2022)

Abstract: This paper presents a new method for automatically detecting words with lexical gender in large-scale language datasets. Currently, the evaluation of gender bias in natural language processing relies on manually compiled lexicons of gendered expressions, such as pronouns ('he', 'she', etc.) and nouns with lexical gender ('mother', 'boyfriend', 'policewoman', etc.). However, manual compilation of such lists can lead to static information if they are not periodically updated and often involve value judgments by individual annotators and researchers. Moreover, terms not included in the list fall out of the range of analysis. To address these issues, we devised a scalable, dictionary-based method to automatically detect lexical gender that can provide a dynamic, up-to-date analysis with high coverage. Our approach reaches over 80% accuracy in determining the lexical gender of nouns retrieved randomly from a Wikipedia sample and when testing on a list of gendered words used in previous research.

Comments:	12 pages, 4 tables, 2 figures. Article published under different title in Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion at ACL 2022
Subjects:	Computation and Language (cs.CL)
DOI:	10.18653/v1/2022.ltedi-1.7
Cite as:	arXiv:2206.14055 [cs.CL]
	(or arXiv:2206.14055v1 [cs.CL] for this version)

Submission history

From: Marion Bartl [view email]
[v1] Tue, 28 Jun 2022 14:57:26 GMT (648kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.14055

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Towards Lexical Gender Inference: A Scalable Methodology using Online Databases

Submission history