Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multiple-Choice Question Answering on PhyX 1k (test)
Loading...
79.7
MCQ Exact Accuracy
Claude Sonnet 4.5
45.484
54.367
63.25
72.133
May 13, 2026
MCQ Exact Accuracy
Updated 19d ago
Evaluation Results
Method
Method
Links
MCQ Exact Accuracy
Claude Sonnet 4.5
max_tokens=16384
2026.05
79.7
Physics-R1 (dense)
max_tokens=16384, rewa...
2026.05
78.3
Physics-R1 (binary, seed 42)
max_tokens=16384, rewa...
2026.05
78
Physics-R1 (binary, 3-seed mean ±σ)
max_tokens=16384, rewa...
2026.05
77.8
Gemini 2.5 Pro
measured_by=authors
2026.05
75.1
Qwen3-VL-32B-Thinking
2026.05
73.8
Qwen3-VL-8B-Thinking (base)
2026.05
73.7
GPT-4o
source=Shen et al. [2025]
2026.05
70.4
InternVL3-8B
2026.05
46.8
Feedback
Search any
task
Search any
task