Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Evaluation Criterion Generation on HealthBench
Loading...
10
Specificity
Qworld
1.68
3.84
6
8.16
Mar 6, 2026
Specificity
Implicitness
Updated 23d ago
Evaluation Results
Method
Method
Links
Specificity
Implicitness
Qworld
Retrieval=Yes
2026.03
10
89
RocketEval
2026.03
9
76
Qworld
Retrieval=No
2026.03
9
87
EvalAgent
2026.03
4
83
TICK
2026.03
3
73
OpenRubrics
2026.03
2
89
Feedback
Search any
task
Search any
task