Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narratives

About

A rapidly developing application of LLMs in XAI is to convert quantitative explanations such as SHAP into user-friendly narratives to explain the decisions made by smaller prediction models. Evaluating the narratives without relying on human preference studies or surveys is becoming increasingly important in this field. In this work we propose a framework and explore several automated metrics to evaluate LLM-generated narratives for explanations of tabular classification tasks. We apply our approach to compare several state-of-the-art LLMs across different datasets and prompt types. As a demonstration of their utility, these metrics allow us to identify new challenges related to LLM hallucinations for XAI narratives.

Timour Ichmoukhamedov, James Hinns, David Martens• 2024

Related benchmarks

TaskDatasetResultRank
Faithful Narrative Generationstudent
RA92.5
16
Faithful Narrative GenerationCredit
RA93.8
16
Faithful Narrative Generationfifa
RA Score87.5
16
Faithful Narrative GenerationDiabetes
RA0.838
16
Faithful Narrative Generationstroke
RA0.925
16
Showing 5 of 5 rows

Other info

Follow for update