Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NEJM

Benchmarks

Task NameDataset NameSOTA ResultTrend
Medical Question AnsweringNEJM
Accuracy73.13
16
Medical DiagnosisNEJM
Rounds17.91
9
Medical Long-form GenerationNEJM (test)
SM Score76.6
8
Medical ReasoningNEJM (OOD)
Accuracy68.3
7
Multi-turn Clinical SimulationNEJM Extension OOD (test)
Accuracy33.3
6
Multi-turn Clinical SimulationNEJM OOD (test)
Accuracy46.7
6
Showing 6 of 6 rows