Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Strategic Decision Making on MMG2Skill-Bench Strategy
Loading...
69.17
Success Rate
MMG2Skill
37.97
46.07
54.17
62.27
Jun 1, 2026
Success Rate
Updated 1d ago
Evaluation Results
Method
Method
Links
Success Rate
MMG2Skill
Backbone=Qwen3.6-Plus,...
2026.06
69.17
MMG2Skill
Backbone=GPT-5.5, Prom...
2026.06
65
MMG2Skill
Backbone=Claude-Sonnet...
2026.06
61.67
MMG2Skill
Backbone=Claude-Opus-4...
2026.06
60.83
MMG2Skill
Backbone=Gemini-3.1-Pr...
2026.06
60.83
Vanilla
Backbone=Gemini-3.1-Pr...
2026.06
54.17
MMG2Skill
Backbone=Kimi-K2.6, Pr...
2026.06
52.5
Raw Guide
Backbone=Gemini-3.1-Pr...
2026.06
52.5
Raw Guide
Backbone=GPT-5.5, Prom...
2026.06
50
Vanilla
Backbone=GPT-5.5, Prom...
2026.06
49.17
Vanilla
Backbone=Qwen3.6-Plus,...
2026.06
48.33
Vanilla
Backbone=Kimi-K2.6, Pr...
2026.06
46.67
Raw Guide
Backbone=Claude-Opus-4...
2026.06
44.17
Raw Guide
Backbone=Qwen3.6-Plus,...
2026.06
44.17
Vanilla
Backbone=Claude-Opus-4...
2026.06
41.67
Raw Guide
Backbone=Kimi-K2.6, Pr...
2026.06
40.83
Vanilla
Backbone=Claude-Sonnet...
2026.06
40
Raw Guide
Backbone=Claude-Sonnet...
2026.06
39.17
Feedback
Search any
task
Search any
task