Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Evaluation of Explanations on HealthVer
Loading...
2.15
Helpfulness (MAR)
PromptBaseline
0.174
0.687
1.2
1.713
May 23, 2025
Helpfulness (MAR)
Consistency (MAR)
Non-redundancy (MAR)
Coverage (MAR)
Overall Quality (MAR)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Helpfulness (MAR)
Consistency (MAR)
Non-redundancy (MAR)
Coverage (MAR)
Overall Quality (MAR)
PromptBaseline
Base Model=Qwen2.5-14B...
2025.05
2.15
2.033
2.117
2.167
2.033
CLUE-Span+Steering
Base Model=Qwen2.5-14B...
2025.05
1.967
2.017
1.983
1.9
2.033
CLUE-Span
Base Model=Qwen2.5-14B...
2025.05
1.867
1.817
1.833
1.8
1.917
CLUE
Backbone=Qwen2.5-14B-I...
2025.05
0.772
0.751
0.784
0.727
0.667
PromptBaseline
Backbone=Qwen2.5-14B-I...
2025.05
0.25
0.27
0.242
0.266
0.336
Feedback
Search any
task
Search any
task