Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Prompting Fairness: Integrating Causality to Debias Large Language Models

About

Large language models (LLMs), despite their remarkable capabilities, are susceptible to generating biased and discriminatory responses. As LLMs increasingly influence high-stakes decision-making (e.g., hiring and healthcare), mitigating these biases becomes critical. In this work, we propose a causality-guided debiasing framework to tackle social biases, aiming to reduce the objectionable dependence between LLMs' decisions and the social information in the input. Our framework introduces a novel perspective to identify how social information can affect an LLM's decision through different causal pathways. Leveraging these causal insights, we outline principled prompting strategies that regulate these pathways through selection mechanisms. This framework not only unifies existing prompting-based debiasing techniques, but also opens up new directions for reducing bias by encouraging the model to prioritize fact-based reasoning over reliance on biased social cues. We validate our framework through extensive experiments on real-world datasets across multiple domains, demonstrating its effectiveness in debiasing LLM decisions, even with only black-box access to the model.

Jingling Li, Zeyu Tang, Xiaoyu Liu, Peter Spirtes, Kun Zhang, Liu Leqi, Yang Liu• 2024

Related benchmarks

TaskDatasetResultRank
Evaluation-based Bias ReductionBias Reduction Benchmark (Evaluation)
Bias Reduction Performance99.8
35
Memory-based Bias ReductionBias Reduction Benchmark Memory
Bias Reduction Performance30.2
35
Memory Fidelity EvaluationMemory-based Experiment Seen Features
P-Diff0.156
32
Bias MeasurementStereoSet--
25
Occupation classificationBias-in-Bio lightweight (test)
Overall Accuracy77.11
16
Bias EvaluationBBQ averaged across gender, nationality, and religion domains
Accuracy (Ambiguous)60.38
16
Stereotype Bias EvaluationStereoSet (test)
Gender SS66.39
8
Natural Language InferenceBias-NLI
Pe (Bias-NLI)38.4
8
Showing 8 of 8 rows

Other info

Follow for update