Medical Diagnosis

Benchmarks

Dataset Name	SOTA Method	Metric
MIMIC-IV diagnostic evaluation set (test)	GLEAN (N=3)	Accuracy78.33	54	4mo ago
agent-CMB	Medical-CoT*	Rounds18.34	25	4mo ago
MedQA agent	MedKGI	Rounds9.11	25	4mo ago
COVID-19 Radiography Database	score-2	Mean Prediction Set Size2.058	20	1mo ago
AgentClinic OOD original (test)	Aloe-Beta-70B	Similarity (Sim)0.684	20	2mo ago
Overall Dirichlet α=0.3 partition	MedLatentDx	Accuracy70	18	1mo ago
Phenopacket Dirichlet α=0.3 partition	MedLatentDx	Accuracy72	18	1mo ago
Zenodo Dirichlet α=0.3 partition	MedLatentDx	Accuracy66	18	1mo ago
RareBench Dirichlet α=0.3 partition	MedLatentDx	Accuracy68	18	1mo ago
MedEinst Robust 1.0	ECR-Agent (Qwen3-32B)	Robust Accuracy24.21	18	4mo ago
MedEinst Baseline 1.0	ECR-Agent (Qwen3-32B)	Baseline Accuracy69.49	18	4mo ago
COVID19-CT	SH-PEFT	F1 Score83	16	4mo ago
MAU (test)	UMed-LVLM	DL Score53	13	4mo ago
Dynamic Multi-turn Diagnostic Evaluation Chinese	PACT	Strict DA55.31	12	1mo ago
PMC-Patients	MedExAgent-8B	Similarity Score62.6	12	2mo ago
DDxPlus	MedExAgent-8B	Similarity96.6	12	2mo ago
DDXPlus n=50	BMBE + GPT-5.4-nano	Top-1 Accuracy78	12	3mo ago
MedQA	MedRoute	Accuracy88.76	12	1mo ago
Step-CoT (test)	Ours (Teacher)	Accuracy78.3	10	4mo ago
CXR14 (external)		Precision for Edema71.26	10	4mo ago
MedAction 300 Hard		Diag. Acc.82	9	2mo ago
MedR-Bench		Diagnostic Accuracy81	9	2mo ago
DiagnosisArena (test)	GoS	Match (LLM-as-a-Judge)31.88	9	4mo ago
MediQ (test)		Average Outcome Reward74.67	9	4mo ago
NEJM	DDO	Rounds17.91	9	4mo ago

Showing 25 of 48 rows