Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MedAgentBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Clinical Task ExecutionMedAgentBench OOD v2
Accuracy87.1
35
Clinical Task ExecutionMedAgentBench v2 (test)
Accuracy76.9
35
Clinical Task ExecutionMedAgentBench v2 (val)
Accuracy77
35
Clinical Task ExecutionMedAgentBench OOD
Accuracy80.6
35
Clinical Task ExecutionMedAgentBench (test)
Accuracy88.8
35
Clinical Task ExecutionMedAgentBench (val)
Accuracy86.2
35
Medical Agent Task ExecutionMedAgentBench
Success Rate79.3
24
Multi-agent recommendationMedAgentBench
Top-1 Acc100
4
Single-agent tool selectionMedAgentBench
Top-1 Accuracy99
4
Medical Agentic ReasoningMedAgentBench
Accuracy87
3
Showing 10 of 10 rows