Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Physics Reasoning on QFT Synthetic Medium
Loading...
88.8
Accuracy
Gemini-2.5-flash
23.696
40.598
57.5
74.402
Apr 21, 2026
Accuracy
Pass@5
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Pass@5
Gemini-2.5-flash
Model Type=Proprietary
2026.04
88.8
97.5
Claude-Opus-4.5
Model Type=Proprietary
2026.04
85.2
93.8
OSS-20b
Model Type=Open-Weight
2026.04
76
93.8
Qwen3-4B-Thinking-2507
Model Type=Open-Weight
2026.04
49.5
68.8
Qwen3-4B-Instruct-2507
Model Type=Open-Weight
2026.04
33.2
46.2
DeepSeek-R1-Distill-Qwen-7B
Model Type=Open-Weight
2026.04
26.2
50
Feedback
Search any
task
Search any
task