Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
MCQ Diagnostic Accuracy on Stanford Echo (test)
Loading...
64
Accuracy
MARCUS
21.256
32.353
43.45
54.547
Mar 23, 2026
Accuracy
95% CI (Lower Bound)
p-value
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
95% CI (Lower Bound)
p-value
MARCUS
N=50
2026.03
64
50
-
GPT-5
N=50
2026.03
34
22
-
Gemini 2.5 Pro
N=48
2026.03
22.9
10.4
-
MARCUS vs GPT-5 (McNemar p)
N=50
2026.03
-
-
0.007
MARCUS vs Gemini (McNemar p)
N=48
2026.03
-
-
0.001
Feedback
Search any
task
Search any
task