Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety evaluation (Accuracy) on Beaver
Loading...
84.65
Accuracy (Beaver Safety)
MCPO-DAPO
57.35
64.4375
71.525
78.6125
May 25, 2026
Accuracy (Beaver Safety)
Updated 7d ago
Evaluation Results
Method
Method
Links
Accuracy (Beaver Safety)
MCPO-DAPO
Backbone=Qwen3-8B
2026.05
84.65
MCPO-GRPO
Backbone=Qwen3-8B
2026.05
84.26
MCPO-GRPO
Backbone=Qwen3-4B
2026.05
83.32
MCPO-DAPO
Backbone=Qwen3-4B
2026.05
82.61
MT-GRPO
Backbone=Qwen3-8B
2026.05
81.63
DAPO
Backbone=Qwen3-8B
2026.05
81.15
MGS
Backbone=Qwen3-4B
2026.05
80.4
MGS
Backbone=Qwen3-8B
2026.05
80.23
GRPO
Backbone=Qwen3-8B
2026.05
80.12
MT-GRPO
Backbone=Qwen3-4B
2026.05
79.93
CLIPO
Backbone=Qwen3-8B
2026.05
78.86
GRPO
Backbone=Qwen3-4B
2026.05
78.19
DAPO
Backbone=Qwen3-4B
2026.05
78.08
CLIPO
Backbone=Qwen3-4B
2026.05
76.94
Base Model
Backbone=Qwen3-8B
2026.05
64.27
Base Model
Backbone=Qwen3-4B
2026.05
58.4
Feedback
Search any
task
Search any
task