Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Physics Reasoning on QFT Synthetic Hard
Loading...
44.5
Accuracy
Claude-Opus-4.5
-1.78
10.235
22.25
34.265
Apr 21, 2026
Accuracy
Pass@5
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Pass@5
Claude-Opus-4.5
Model Type=Proprietary
2026.04
44.5
62.5
Gemini-2.5-flash
Model Type=Proprietary
2026.04
30.2
57.5
OSS-20b
Model Type=Open-Weight
2026.04
22.2
53.8
Qwen3-4B-Thinking-2507
Model Type=Open-Weight
2026.04
3.2
7.5
Qwen3-4B-Instruct-2507
Model Type=Open-Weight
2026.04
2.5
6.2
DeepSeek-R1-Distill-Qwen-7B
Model Type=Open-Weight
2026.04
0
0
Feedback
Search any
task
Search any
task