Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

JoPA:Explaining Large Language Model's Generation via Joint Prompt Attribution

About

Large Language Models (LLMs) have demonstrated impressive performances in complex text generation tasks. However, the contribution of the input prompt to the generated content still remains obscure to humans, underscoring the necessity of understanding the causality between input and output pairs. Existing works for providing prompt-specific explanation often confine model output to be classification or next-word prediction. Few initial attempts aiming to explain the entire language generation often treat input prompt texts independently, ignoring their combinatorial effects on the follow-up generation. In this study, we introduce a counterfactual explanation framework based on Joint Prompt Attribution, JoPA, which aims to explain how a few prompt texts collaboratively influences the LLM's complete generation. Particularly, we formulate the task of prompt attribution for generation interpretation as a combinatorial optimization problem, and introduce a probabilistic algorithm to search for the casual input combination in the discrete space. We define and utilize multiple metrics to evaluate the produced explanations, demonstrating both the faithfulness and efficiency of our framework.

Yurui Chang, Bochuan Cao, Yujia Wang, Jinghui Chen, Lu Lin• 2024

Related benchmarks

TaskDatasetResultRank
Faithfulness MeasurementMHC
BLEU57.9
18
Faithfulness MeasurementAlpaca
BLEU0.484
12
Faithfulness Measurementtldr_news
BLEU69.2
12
Faithfulness EvaluationAlpaca 800 samples
BLEU47.9
5
Faithfulness Evaluationtldr_news 800 samples
BLEU68.7
5
Explanation GenerationAlpaca avg prompt instance
Inference Time (s)15.225
2
Explanation Generationtldr_news avg prompt instance
Latency (s)15.397
2
Explanation GenerationMHC avg prompt instance
Time (s)14.473
2
Malicious Prompt DetectionGCG attacks on Llama-2 7B-Chat
Detection Accuracy100
1
Malicious Prompt DetectionLlama-2 Prompt with Random Search 7B-Chat
Detection Accuracy91
1
Showing 10 of 10 rows

Other info

Code

Follow for update