Flexible text generation for counterfactual fairness probing

Fryer, Zee; Axelrod, Vera; Packer, Ben; Beutel, Alex; Chen, Jilin; Webster, Kellie

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2206

Computer Science > Computation and Language

Title: Flexible text generation for counterfactual fairness probing

Authors: Zee Fryer, Vera Axelrod, Ben Packer, Alex Beutel, Jilin Chen, Kellie Webster

(Submitted on 28 Jun 2022)

Abstract: A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals: does the classifier output change if a sensitive attribute in the input is changed? Existing counterfactual generation methods typically rely on wordlists or templates, producing simple counterfactuals that don't take into account grammar, context, or subtle sensitive attribute references, and could miss issues that the wordlist creators had not considered. In this paper, we introduce a task for generating counterfactuals that overcomes these shortcomings, and demonstrate how large language models (LLMs) can be leveraged to make progress on this task. We show that this LLM-based method can produce complex counterfactuals that existing methods cannot, comparing the performance of various counterfactual generation methods on the Civil Comments dataset and showing their value in evaluating a toxicity classifier.

Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY)
Cite as:	arXiv:2206.13757 [cs.CL]
	(or arXiv:2206.13757v1 [cs.CL] for this version)

Submission history

From: Zee Fryer [view email]
[v1] Tue, 28 Jun 2022 05:07:20 GMT (241kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.13757v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Flexible text generation for counterfactual fairness probing

Submission history