Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Correction on CriticBench (test)
Loading...
75.38
Math Score
CIPO
67.3304
69.4202
71.51
73.5998
May 14, 2026
Math Score
Communication Score
Symbolism Score
Algorithm Score
Average Score
Updated 19d ago
Evaluation Results
Method
Method
Links
Math Score
Communication Score
Symbolism Score
Algorithm Score
Average Score
CIPO
Base Model=Qwen3-4B, T...
2026.05
75.38
48.72
92.72
74.47
72.82
Qwen3-32B
Model=Qwen3-32B
2026.05
74.23
55
97.21
76.95
75.85
Qwen2.5-72B-Instruct
Model=Qwen2.5-72B-Inst...
2026.05
73.93
55.27
92.26
77.3
74.69
GRPO
Base Model=Qwen3-4B, T...
2026.05
70.71
48.89
89.78
72.34
70.43
Initial
Base Model=Qwen3-4B
2026.05
67.64
49.07
86.69
71.63
68.75
Feedback
Search any
task
Search any
task