Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Evaluation Criteria Generation on HealthBench
Loading...
90
Coverage
Qworld ret.
44.24
56.12
68
79.88
Mar 6, 2026
Coverage
Uniqueness
Insight
Granularity
Updated 23d ago
Evaluation Results
Method
Method
Links
Coverage
Uniqueness
Insight
Granularity
Qworld ret.
Backbone=GPT-4.1, Retr...
2026.03
90
82
84
83
Qworld
Backbone=GPT-4.1
2026.03
89
79
83
85
EvalAgent
Backbone=GPT-4.1
2026.03
83
50
40
65
OpenRubrics
Backbone=GPT-4.1
2026.03
54
37
36
49
RocketEval
Backbone=GPT-4.1
2026.03
53
26
42
83
TICK
Backbone=GPT-4.1
2026.03
46
24
29
79
Feedback
Search any
task
Search any
task