Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-ended Question Answering on OlymBench Phys v1 (test)
Loading...
53.9
Problem Level Score
Qwen3-VL-32B-Thinking
8.66
20.405
32.15
43.895
May 13, 2026
Problem Level Score
Updated 19d ago
Evaluation Results
Method
Method
Links
Problem Level Score
Qwen3-VL-32B-Thinking
2026.05
53.9
Claude Sonnet 4.5
max_tokens=16384
2026.05
50.4
Physics-R1 (binary, 3-seed mean ±σ)
max_tokens=16384, rewa...
2026.05
46.2
Physics-R1 (binary, seed 42)
max_tokens=16384, rewa...
2026.05
45.4
Physics-R1 (dense)
max_tokens=16384, rewa...
2026.05
40.5
Qwen3-VL-8B-Thinking (base)
2026.05
39.3
Gemini 2.5 Pro
2026.05
37.4
GPT-4o
2026.05
19.7
InternVL3-8B
2026.05
10.4
Feedback
Search any
task
Search any
task