Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Critique on CriticBench (test)
Loading...
94.1
Math Score
Qwen3-32B
89.6384
90.7967
91.955
93.1133
May 14, 2026
Math Score
Communication Score
Symbolism Score
Algorithm Score
Average Score
Updated 19d ago
Evaluation Results
Method
Method
Links
Math Score
Communication Score
Symbolism Score
Algorithm Score
Average Score
Qwen3-32B
Model=Qwen3-32B
2026.05
94.1
67.36
94.41
56.12
78
CIPO
Base Model=Qwen3-4B, T...
2026.05
94
61.74
90.42
55.94
75.53
GRPO
Base Model=Qwen3-4B, T...
2026.05
93.87
61.55
90.51
54.79
75.18
Initial
Base Model=Qwen3-4B
2026.05
92.82
58.22
83.68
50.72
71.36
Qwen2.5-72B-Instruct
Model=Qwen2.5-72B-Inst...
2026.05
89.81
64.94
91.99
63.58
77.58
Feedback
Search any
task
Search any
task