Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Physics Reasoning on QFT Synthetic Easy
Loading...
89
Accuracy
Gemini-2.5-flash
38.248
51.424
64.6
77.776
Apr 21, 2026
Accuracy
Pass@5
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Pass@5
Gemini-2.5-flash
Model Type=Proprietary
2026.04
89
96.2
Claude-Opus-4.5
Model Type=Proprietary
2026.04
81.5
95
OSS-20b
Model Type=Open-Weight
2026.04
81.5
95
Qwen3-4B-Thinking-2507
Model Type=Open-Weight
2026.04
69.5
83.8
Qwen3-4B-Instruct-2507
Model Type=Open-Weight
2026.04
53.5
75
DeepSeek-R1-Distill-Qwen-7B
Model Type=Open-Weight
2026.04
40.2
67.5
Feedback
Search any
task
Search any
task