Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HealthBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Medical Question AnsweringHealthBench Overall
Overall Score60.1
16
Medical Question AnsweringHealthBench Hard
Score34.7
16
Model Selection EvaluationHealthBench
Actual (per type)90.5
5
Medical KnowledgeHealthBench
Score47.45
5
Question AnsweringHealthBench 500-conversation (out-of-domain)
HealthBench Score0.649
5
Medical Question AnsweringHealthBench normal
Pass@165.2
4
Hallucination DetectionHealthBench (test)
AUC96.48
4
Medical Response RefinementHealthBench 254 medical queries
Base Score59
4
Hallucination SuppressionHealthBench Hallu
Refuted Rate2.37
4
Medical ReasoningHealthBench
HealthBench Score66.2
4
Clinical Intent AlignmentHealthBench
CIA60.12
3
Showing 11 of 11 rows