We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Inference-Time Rule Eraser: Distilling and Removing Bias Rules to Mitigate Bias in Deployed Models

Abstract: Machine learning models often make predictions based on biased features such as gender, race, and other social attributes, posing significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Traditional approaches to addressing this issue involve retraining or fine-tuning neural networks with fairness-aware optimization objectives. However, these methods can be impractical due to significant computational resources, complex industrial tests, and the associated CO2 footprint. Additionally, regular users aiming to use fair models often lack access to model parameters. In this paper, we introduce Inference-Time Rule Eraser (Eraser), a novel method focused on removing biased decision-making rules during inference to address fairness concerns without modifying model weights. We begin by establishing a theoretical foundation for modifying model outputs to eliminate biased rules through Bayesian analysis. Next, we present a specific implementation of Eraser that involves two stages: (1) querying the model to distill biased rules into a patched model, and (2) excluding these biased rules during inference. Extensive experiments validate the effectiveness of our approach, showcasing its superior performance in addressing fairness concerns in AI systems.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as: arXiv:2404.04814 [cs.LG]
  (or arXiv:2404.04814v2 [cs.LG] for this version)

Submission history

From: Yi Zhang [view email]
[v1] Sun, 7 Apr 2024 05:47:41 GMT (3015kb)
[v2] Tue, 30 Apr 2024 04:21:07 GMT (1338kb,D)

Link back to: arXiv, form interface, contact.