Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Evaluation of Explanations on DRUID
Loading...
1.917
Helpfulness (MAR)
CLUE-Span
0.2478
0.68115
1.1145
1.54785
May 23, 2025
Helpfulness (MAR)
Consistency (MAR)
Non-redundancy (MAR)
Coverage (MAR)
Overall Quality (MAR)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Helpfulness (MAR)
Consistency (MAR)
Non-redundancy (MAR)
Coverage (MAR)
Overall Quality (MAR)
CLUE-Span
Base Model=Qwen2.5-14B...
2025.05
1.917
1.75
1.983
1.75
1.9
PromptBaseline
Base Model=Qwen2.5-14B...
2025.05
1.9
1.717
1.983
1.767
1.9
CLUE-Span+Steering
Base Model=Qwen2.5-14B...
2025.05
1.767
1.617
1.683
1.617
1.817
CLUE
Backbone=Qwen2.5-14B-I...
2025.05
0.688
0.691
0.739
0.717
0.688
PromptBaseline
Backbone=Qwen2.5-14B-I...
2025.05
0.312
0.309
0.261
0.283
0.313
Feedback
Search any
task
Search any
task