Inference-Time Rule Eraser: Distilling and Removing Bias Rules to Mitigate Bias in Deployed Models

Zhang, Yi; Sang, Jitao

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2404

Computer Science > Machine Learning

Title: Inference-Time Rule Eraser: Distilling and Removing Bias Rules to Mitigate Bias in Deployed Models

Authors: Yi Zhang, Jitao Sang

(Submitted on 7 Apr 2024 (this version), latest version 30 Apr 2024 (v2))

Abstract: Fairness is critical for artificial intelligence systems, especially for those deployed in high-stakes applications such as hiring and justice. Existing efforts toward fairness in machine learning fairness require retraining or fine-tuning the neural network weights to meet the fairness criteria. However, this is often not feasible in practice for regular model users due to the inability to access and modify model weights. In this paper, we propose a more flexible fairness paradigm, Inference-Time Rule Eraser, or simply Eraser, which considers the case where model weights can not be accessed and tackles fairness issues from the perspective of biased rules removal at inference-time. We first verified the feasibility of modifying the model output to wipe the biased rule through Bayesian analysis, and deduced Inference-Time Rule Eraser via subtracting the logarithmic value associated with unfair rules (i.e., the model's response to biased features) from the model's logits output as a means of removing biased rules. Moreover, we present a specific implementation of Rule Eraser that involves two stages: (1) limited queries are performed on the model with inaccessible weights to distill its biased rules into an additional patched model, and (2) during inference time, the biased rules already distilled into the patched model are excluded from the output of the original model, guided by the removal strategy outlined in Rule Eraser. Exhaustive experimental evaluation demonstrates the effectiveness and superior performance of the proposed Rule Eraser in addressing fairness concerns.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as:	arXiv:2404.04814 [cs.LG]
	(or arXiv:2404.04814v1 [cs.LG] for this version)

Submission history

From: Yi Zhang [view email]
[v1] Sun, 7 Apr 2024 05:47:41 GMT (3015kb)
[v2] Tue, 30 Apr 2024 04:21:07 GMT (1338kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.04814v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Inference-Time Rule Eraser: Distilling and Removing Bias Rules to Mitigate Bias in Deployed Models

Submission history