Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
MCQ Diagnostic Accuracy on Stanford Multimodal (test)
Loading...
73.7
Accuracy
MARCUS
20.452
34.276
48.1
61.924
Mar 23, 2026
Accuracy
95% CI
p-value
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
95% CI
p-value
MARCUS
N=38
2026.03
73.7
60.5
-
Gemini 2.5 Pro
N=34
2026.03
29.4
14.7
-
GPT-5
N=40
2026.03
22.5
10
-
MARCUS vs GPT-5 (McNemar p)
N=38
2026.03
-
-
0.001
MARCUS vs Gemini (McNemar p)
N=32
2026.03
-
-
0.001
Feedback
Search any
task
Search any
task