Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Knowledge on HealthBench (Pass@1)
Loading...
92.82
Pass@1
GPT-5.2-chat (teacher)
82.94
85.505
88.07
90.635
May 8, 2026
Pass@1
Updated 23d ago
Evaluation Results
Method
Method
Links
Pass@1
GPT-5.2-chat (teacher)
2026.05
92.82
ROPD
Thinking Mode=Thinking
2026.05
86.87
OVD
Thinking Mode=Thinking
2026.05
85.98
GAD
Thinking Mode=Thinking
2026.05
85.7
T-Judge
Thinking Mode=Thinking
2026.05
85.58
Qwen3-4B (student)
Thinking Mode=Thinking
2026.05
85.3
ROPD
Thinking Mode=Non-Thin...
2026.05
84.92
T-Judge
Thinking Mode=Non-Thin...
2026.05
84.52
OVD
Thinking Mode=Non-Thin...
2026.05
83.68
GAD
Thinking Mode=Non-Thin...
2026.05
83.57
Qwen3-4B (student)
Thinking Mode=Non-Thin...
2026.05
83.32
Feedback
Search any
task
Search any
task