Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Question Answering on PubMedQA
Loading...
86
Pass@1
MedXIAOHE
19.752
36.951
54.15
71.349
Sep 28, 2025
Oct 21, 2025
Nov 13, 2025
Dec 6, 2025
Dec 29, 2025
Jan 21, 2026
Feb 13, 2026
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
MedXIAOHE
Mode=Thinking mode, De...
2026.02
86
Gemini 3.0 Pro
Decoding=Greedy
2026.02
80.8
GPT-5.2 Thinking
Mode=Thinking mode, De...
2026.02
79.8
Gemini 2.5 Pro
Decoding=Greedy
2026.02
75.6
Qwen2.5-7B + GRPO w/ VERL.
Base Model=Qwen2.5-7B,...
2025.09
47.8
Qwen2.5-7B + GRPO
Base Model=Qwen2.5-7B,...
2025.09
46.6
Qwen2.5-7B + PPO
Base Model=Qwen2.5-7B,...
2025.09
45.8
Qwen2.5-7B + PPO w/ VERL.
Base Model=Qwen2.5-7B,...
2025.09
45.4
Qwen2.5-7B
Base Model=Qwen2.5-7B
2025.09
43.6
Mathstral-7B-v0.1 + GRPO w/ VERL.
Base Model=Mathstral-7...
2025.09
24.4
Mathstral-7B-v0.1 + PPO
Base Model=Mathstral-7...
2025.09
23.1
Mathstral-7B-v0.1 + GRPO
Base Model=Mathstral-7...
2025.09
22.6
Mathstral-7B-v0.1
Base Model=Mathstral-7...
2025.09
22.4
Mathstral-7B-v0.1 + PPO w/ VERL.
Base Model=Mathstral-7...
2025.09
22.3
Feedback
Search any
task
Search any
task