Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Question Answering on MedXpertQA OOD (test)
Loading...
69.2
Accuracy
Gemini-3-flash
20.424
33.087
45.75
58.413
Jul 7, 2025
Aug 16, 2025
Sep 26, 2025
Nov 6, 2025
Dec 17, 2025
Jan 27, 2026
Mar 9, 2026
Accuracy
Updated 11d ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini-3-flash
Framework=Framework II...
2026.03
69.2
o3
Model size category=La...
2025.07
67.5
Gemini 2.5 Pro
Model size category=La...
2025.07
58.9
Gemini-3-flash
Framework=Framework II...
2026.03
54.4
Gemini 2.5 Flash
Model size category=La...
2025.07
47.4
Gemini-2.5-pro
Framework=Framework II...
2026.03
46.6
o3
Framework=Framework II...
2026.03
44.1
GPT-5
Framework=Framework II...
2026.03
40.4
Ophiuchus-7B
Framework=Framework II...
2026.03
39.3
Meissa
Framework=Framework II...
2026.03
36
Qwen3-VL-4B
Framework=Framework II...
2026.03
33.3
Gemma 3 27B
Model size category=Sm...
2025.07
29.8
MedGemma 4B
Model size category=Sm...
2025.07
24.4
Qwen3-VL-4B
Framework=Framework II...
2026.03
23.9
Gemma 3 4B
Model size category=Sm...
2025.07
22.3
Feedback
Search any
task
Search any
task