Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Physical Reasoning on PHYBench
Loading...
5.3
Pass@1 Accuracy
AERO
2.596
3.298
4
4.702
Feb 3, 2026
Pass@1 Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
AERO
Backbone Model=Qwen3-8...
2026.02
5.3
AERO
Backbone Model=Qwen3-8...
2026.02
5.1
AERO
Backbone Model=Qwen3-8...
2026.02
4
AERO
Backbone Model=Qwen3-4...
2026.02
3.9
Qwen3-8B-Base
Backbone Model=Qwen3-8...
2026.02
3.8
AERO
Backbone Model=Qwen3-8...
2026.02
3.8
AERO
Backbone Model=Qwen3-4...
2026.02
3.7
AERO
Backbone Model=Qwen3-4...
2026.02
3.4
AERO
Backbone Model=Qwen3-8...
2026.02
3.4
Qwen3-4B-Base
Backbone Model=Qwen3-4...
2026.02
2.7
AERO
Backbone Model=Qwen3-4...
2026.02
2.7
AERO
Backbone Model=Qwen3-4...
2026.02
2.7
Feedback
Search any
task
Search any
task