AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs

Ebrahimi, Sana; Chen, Kaiwen; Asudeh, Abolfazl; Das, Gautam; Koudas, Nick

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2403

Computer Science > Computation and Language

Title: AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs

Authors: Sana Ebrahimi, Kaiwen Chen, Abolfazl Asudeh, Gautam Das, Nick Koudas

(Submitted on 1 Mar 2024)

Abstract: Pre-trained Large Language Models (LLMs) have significantly advanced natural language processing capabilities but are susceptible to biases present in their training data, leading to unfair outcomes in various applications. While numerous strategies have been proposed to mitigate bias, they often require extensive computational resources and may compromise model performance. In this work, we introduce AXOLOTL, a novel post-processing framework, which operates agnostically across tasks and models, leveraging public APIs to interact with LLMs without direct access to internal parameters. Through a three-step process resembling zero-shot learning, AXOLOTL identifies biases, proposes resolutions, and guides the model to self-debias its outputs. This approach minimizes computational costs and preserves model performance, making AXOLOTL a promising tool for debiasing LLM outputs with broad applicability and ease of use.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2403.00198 [cs.CL]
	(or arXiv:2403.00198v1 [cs.CL] for this version)

Submission history

From: Sana Ebrahimi [view email]
[v1] Fri, 1 Mar 2024 00:02:37 GMT (9088kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.00198

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs

Submission history