Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Factuality on SelfAware
Loading...
0.372
Score
Base Model
0.2264
0.2642
0.302
0.3398
Jan 22, 2026
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Base Model
Backbone=Qwen2.5-3B
2026.01
0.372
CARE-GRPO
Backbone=Qwen2.5-3B, A...
2026.01
0.355
RKL-GRPO
Backbone=Qwen2.5-3B, A...
2026.01
0.351
RKL-DAPO
Backbone=Qwen2.5-3B, A...
2026.01
0.346
RKL-GSPO
Backbone=Qwen2.5-3B, A...
2026.01
0.341
CARE-DAPO
Backbone=Qwen2.5-3B, A...
2026.01
0.334
CARE-GSPO
Backbone=Qwen2.5-3B, A...
2026.01
0.332
GRPO (No Constraint)
Backbone=Qwen2.5-3B, A...
2026.01
0.249
GSPO (No Constraint)
Backbone=Qwen2.5-3B, A...
2026.01
0.243
DAPO (No Constraint)
Backbone=Qwen2.5-3B, A...
2026.01
0.232
Feedback
Search any
task
Search any
task