We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Inference-Time Rule Eraser: Distilling and Removing Bias Rules to Mitigate Bias in Deployed Models

Abstract: Fairness is critical for artificial intelligence systems, especially for those deployed in high-stakes applications such as hiring and justice. Existing efforts toward fairness in machine learning fairness require retraining or fine-tuning the neural network weights to meet the fairness criteria. However, this is often not feasible in practice for regular model users due to the inability to access and modify model weights. In this paper, we propose a more flexible fairness paradigm, Inference-Time Rule Eraser, or simply Eraser, which considers the case where model weights can not be accessed and tackles fairness issues from the perspective of biased rules removal at inference-time. We first verified the feasibility of modifying the model output to wipe the biased rule through Bayesian analysis, and deduced Inference-Time Rule Eraser via subtracting the logarithmic value associated with unfair rules (i.e., the model's response to biased features) from the model's logits output as a means of removing biased rules. Moreover, we present a specific implementation of Rule Eraser that involves two stages: (1) limited queries are performed on the model with inaccessible weights to distill its biased rules into an additional patched model, and (2) during inference time, the biased rules already distilled into the patched model are excluded from the output of the original model, guided by the removal strategy outlined in Rule Eraser. Exhaustive experimental evaluation demonstrates the effectiveness and superior performance of the proposed Rule Eraser in addressing fairness concerns.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as: arXiv:2404.04814 [cs.LG]
  (or arXiv:2404.04814v1 [cs.LG] for this version)

Submission history

From: Yi Zhang [view email]
[v1] Sun, 7 Apr 2024 05:47:41 GMT (3015kb)
[v2] Tue, 30 Apr 2024 04:21:07 GMT (1338kb,D)

Link back to: arXiv, form interface, contact.