Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Physics on Physics tasks
Loading...
52.5
Score
Claude-Sonnet-4.5-Think
21.716
29.708
37.7
45.692
Jan 22, 2026
Score
Performance Difference (Δ)
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Performance Difference (Δ)
Claude-Sonnet-4.5-Think
Generation Mode=LLM-in...
2026.01
52.5
5.8
Claude-Sonnet-4.5-Think
Generation Mode=Standa...
2026.01
46.7
-
GPT-5
Generation Mode=LLM-in...
2026.01
45.8
4.8
DeepSeek-V3.2-Thinking
Generation Mode=LLM-in...
2026.01
45.2
7.6
Kimi-K2-Thinking
Generation Mode=LLM-in...
2026.01
42.3
1.7
GPT-5
Generation Mode=Standa...
2026.01
41
-
Kimi-K2-Thinking
Generation Mode=Standa...
2026.01
40.6
-
DeepSeek-V3.2-Thinking
Generation Mode=Standa...
2026.01
37.6
-
MiniMax-M2
Generation Mode=LLM-in...
2026.01
35.8
5.5
MiniMax-M2
Generation Mode=Standa...
2026.01
30.3
-
Qwen3-Coder-30B-A3B
Generation Mode=LLM-in...
2026.01
28.9
5
Qwen3-4B-Instruct-2507
Generation Mode=LLM-in...
2026.01
25.6
2.7
Qwen3-Coder-30B-A3B
Generation Mode=Standa...
2026.01
23.9
-
Qwen3-4B-Instruct-2507
Generation Mode=Standa...
2026.01
22.9
-
Feedback
Search any
task
Search any
task