Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Natural Language Explanation Generation on DRUID (test)
Loading...
0.102
Faithfulness
CLUE-Span+Steering
-0.13928
-0.07664
-0.014
0.04864
May 23, 2025
Faithfulness
Coverage
Extensibility
LEE
Updated 1mo ago
Evaluation Results
Method
Method
Links
Faithfulness
Coverage
Extensibility
LEE
CLUE-Span+Steering
Model=Qwen2.5-14B
2025.05
0.102
28
20
0.77
CLUE-Span+Steering
Model=OLMo-2-1124-13B
2025.05
0.099
15
70
0.69
CLUE-Span+Steering
Model=Gemma-2-9B-IT
2025.05
0.098
30
47
0.81
CLUE-Span
Model=Qwen2.5-14B
2025.05
0.089
20
38
0.78
CLUE-Span
Model=Gemma-2-9B-IT
2025.05
0.043
23
43
0.76
CLUE-Span
Model=OLMo-2-1124-13B
2025.05
0.014
8
79
0.65
PromptBaseline
Model=Qwen2.5-14B
2025.05
-0.08
-
-
0.6
PromptBaseline
Model=Gemma-2-9B-IT
2025.05
-0.12
-
-
0.57
PromptBaseline
Model=OLMo-2-1124-13B
2025.05
-0.13
-
-
0.53
Feedback
Search any
task
Search any
task