Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Natural Language Inference on MultiMed-X YO
Loading...
68.67
Accuracy
MED-COREASONER
54.11
57.89
61.67
65.45
Jan 13, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
MED-COREASONER
backbone=GPT-5.1
2026.01
68.67
GPT-5.1
2026.01
63.33
GPT-4o
2026.01
62.67
GPT-5.2
2026.01
62
Claude-3.5-haiku
2026.01
54.67
Feedback
Search any
task
Search any
task