Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Towards Transparent AI: A Survey on Explainable Large Language Models

About

Large Language Models (LLMs) have played a pivotal role in advancing Artificial Intelligence (AI). However, despite their achievements, LLMs often struggle to explain their decision-making processes, making them a 'black box' and presenting a substantial challenge to explainability. This lack of transparency poses a significant obstacle to the adoption of LLMs in high-stakes domain applications, where interpretability is particularly essential. To overcome these limitations, researchers have developed various explainable artificial intelligence (XAI) methods that provide human-interpretable explanations for LLMs. However, a systematic understanding of these methods remains limited. To address this gap, this survey provides a comprehensive review of explainability techniques by categorizing XAI methods based on the underlying transformer architectures of LLMs: encoder-only, decoder-only, and encoder-decoder models. Then these techniques are examined in terms of their evaluation for assessing explainability, and the survey further explores how these explanations are leveraged in practical applications. Finally, it discusses available resources, ongoing research challenges, and future directions, aiming to guide continued efforts toward developing transparent and responsible LLMs.

Avash Palikhe, Zhenyu Yu, Zichong Wang, Wenbin Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Faithfulness EvaluationTellMeWhy
AUC π-Soft-NS0.33
67
Faithfulness EvaluationWikiBio
AUC π-Soft-NS0.35
67
Attribution AlignmentCurated Attribution Dataset (NarrativeQA + SciQ)
DSA (Dependent Sentence Attribution)3.15
40
Attribution FaithfulnessLongRA
Soft-NC Score1.54
40
Fact CheckingCausal and Downstream Robustness Ablation Suite Averaged over 4 models
Fact EMΔ1.7
14
Causal AttributionCausal and Downstream Robustness Ablation Suite Averaged over LLaMA-3.1 70B, Phi-3 14B, GPT-J 6B, Qwen2.5 3B
Causal Pass@564
14
Decoding StabilityCausal and Downstream Robustness Ablation Suite Averaged over 4 models
Decoding Δ%2.5
14
Span ExtractionCausal and Downstream Robustness Ablation Suite
Span F160
14
Tool UseCausal and Downstream Robustness Ablation Suite Averaged over 4 models
Tool Hit@1Δ1.8
14
Showing 9 of 9 rows

Other info

Follow for update