Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation
About
Although debiased large language models (LLMs) excel at handling known or low-bias prompts, they often fail on unfamiliar and high-bias prompts. We demonstrate via out-of-distribution (OOD) detection that these high-bias prompts cause a distribution shift, degrading static model performance. To enable real-time correction, we propose CAP-TTA, a test-time adaptation framework. CAP-TTA triggers context-aware LoRA updates only when a bias-risk score exceeds a set threshold. By utilizing an offline precomputed diagonal preconditioner, it ensures fast and stable optimization. Across multiple benchmarks and human evaluations, CAP-TTA effectively reduces toxicity/bias score with significantly lower latency than standard optimization methods (e.g., AdamW or SGD). Furthermore, it prevents catastrophic forgetting, and substantially improves narrative fluency over state-of-the-art baselines without compromising debiasing performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Bias mitigation in text generation | BiasBench toxic prompts | Perplexity13.119 | 10 | |
| Bias Evaluation | Human Evaluation Toxic Prompts | Biased Item Count2 | 3 |