Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Model Calibration on MATH, GSM8K, SelfAware, and TruthfulQA combined
Loading...
0.086
ECE
CARE-GRPO
0.0834
0.10095
0.1185
0.13605
Jan 22, 2026
ECE
Updated 4d ago
Evaluation Results
Method
Method
Links
ECE
CARE-GRPO
Backbone=Qwen2.5-7B
2026.01
0.086
RKL-GSPO
Backbone=Qwen2.5-7B
2026.01
0.088
Base Model
Backbone=Qwen2.5-7B
2026.01
0.089
RKL-DAPO
Backbone=Qwen2.5-7B
2026.01
0.09
RKL-GRPO
Backbone=Qwen2.5-7B
2026.01
0.095
CARE-DAPO
Backbone=Qwen2.5-7B
2026.01
0.099
CARE-GSPO
Backbone=Qwen2.5-7B
2026.01
0.101
GRPO (No Constraint)
Backbone=Qwen2.5-7B
2026.01
0.145
GSPO (No Constraint)
Backbone=Qwen2.5-7B
2026.01
0.149
DAPO (No Constraint)
Backbone=Qwen2.5-7B
2026.01
0.151
Feedback
Search any
task
Search any
task