Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models

About

Large Language Models (LLMs) excel at human-like language generation but often embed and amplify implicit, intersectional biases, especially under persona-driven contexts. Existing bias audits rely on static, embedding-based tests (CEAT, I-WEAT, I-SEAT) that quantify absolute association strengths. We show that they have limitations in capturing dynamic shifts when models adopt social roles. We address this gap by introducing the Bias Amplification Differential and Explainability Score (BADx): a novel, scalable metric that measures persona-induced bias amplification and integrates local explainability insights. BADx comprises three components - differential bias scores (BAD, based on CEAT, I-WEAT, I-SEAT),Persona Sensitivity Index (PSI), and Volatility (Standard Deviation), augmented by LIME-based analysis for emphasizing explainability. This study is divided and performed as two different tasks. Task 1 establishes static bias baselines, and Task 2 applies six persona frames (marginalized and structurally advantaged) to measure BADx, PSI, and volatility. This is studied across five state-of-the-art LLMs (GPT-4o, DeepSeek-R1, LLaMA-4, Claude 4.0 Sonnet and Gemma-3n E4B). Results show persona context significantly modulates bias. GPT-4o exhibits high sensitivity and volatility; DeepSeek-R1 suppresses bias but with erratic volatility; LLaMA-4 maintains low volatility and a stable bias profile with limited amplification; Claude 4.0 Sonnet achieves balanced modulation; and Gemma-3n E4B attains the lowest volatility with moderate amplification. BADx performs better than static methods by revealing context-sensitive biases overlooked in static methods. Our unified method offers a systematic way to detect dynamic implicit intersectional bias in five popular LLMs.

Nandini Arimanda, Achyuth Mukund, Sakthi Balan Muthiah, Rajesh Sharma• 2026

Related benchmarks

TaskDatasetResultRank
Bias EvaluationTask 2 Persona A--
25
Bias EvaluationTask 2 Persona B--
25
Bias EvaluationTask 2 Persona C--
25
Bias EvaluationTask 2 Persona D--
25
Bias EvaluationTask 2 Persona E--
25
Bias EvaluationTask 2 Persona F--
25
Persona Bias EvaluationPersona Engineering Persona A--
5
Persona Bias EvaluationPersona Engineering Persona B--
5
Persona Bias EvaluationPersona Engineering Persona C--
5
Persona Bias EvaluationPersona Engineering (Persona D)--
5
Showing 10 of 12 rows

Other info

Follow for update