We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Towards Lexical Gender Inference: A Scalable Methodology using Online Databases

Abstract: This paper presents a new method for automatically detecting words with lexical gender in large-scale language datasets. Currently, the evaluation of gender bias in natural language processing relies on manually compiled lexicons of gendered expressions, such as pronouns ('he', 'she', etc.) and nouns with lexical gender ('mother', 'boyfriend', 'policewoman', etc.). However, manual compilation of such lists can lead to static information if they are not periodically updated and often involve value judgments by individual annotators and researchers. Moreover, terms not included in the list fall out of the range of analysis. To address these issues, we devised a scalable, dictionary-based method to automatically detect lexical gender that can provide a dynamic, up-to-date analysis with high coverage. Our approach reaches over 80% accuracy in determining the lexical gender of nouns retrieved randomly from a Wikipedia sample and when testing on a list of gendered words used in previous research.
Comments: 12 pages, 4 tables, 2 figures. Article published under different title in Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion at ACL 2022
Subjects: Computation and Language (cs.CL)
DOI: 10.18653/v1/2022.ltedi-1.7
Cite as: arXiv:2206.14055 [cs.CL]
  (or arXiv:2206.14055v1 [cs.CL] for this version)

Submission history

From: Marion Bartl [view email]
[v1] Tue, 28 Jun 2022 14:57:26 GMT (648kb,D)

Link back to: arXiv, form interface, contact.