Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Cross-Lingual Planning on ACEBench
Loading...
78.3
Score (En)
MagicAgent-32B
22.972
37.336
51.7
66.064
Feb 22, 2026
Score (En)
Score (Zh)
Updated 4d ago
Evaluation Results
Method
Method
Links
Score (En)
Score (Zh)
MagicAgent-32B
Model Category=MagicAg...
2026.02
78.3
84.1
GLM-4.7
Model Category=Ultra-S...
2026.02
78.2
87
Qwen3-MAX
Model Category=Ultra-S...
2026.02
77.6
86.6
Kimi-K2-Instruct
Model Category=Ultra-S...
2026.02
77.4
83.3
MagicAgent-30B-A3B
Model Category=MagicAg...
2026.02
74.9
83.5
GPT-5.2
Model Category=Ultra-S...
2026.02
70.7
80.1
Qwen3-235B-A22B-Instruct
Model Category=Ultra-S...
2026.02
70.1
77.1
DeepSeek-V3.1-nothink
Model Category=Ultra-S...
2026.02
67.8
73.2
Qwen3-30B-A3B-Instruct
Model Category=Large-S...
2026.02
61.3
71.1
Qwen3-32B-nothink
Model Category=Large-S...
2026.02
57.2
70.1
Llama3.3-70B-Instruct
Model Category=Large-S...
2026.02
56.7
61.7
GLM-4.7-Flash
Model Category=Large-S...
2026.02
53.5
62
ERNIE-4.5-21B-A3B-PT
Model Category=Large-S...
2026.02
46.9
51.1
Olmo-3.1-32B-Instruct
Model Category=Large-S...
2026.02
25.1
18.4
Feedback
Search any
task
Search any
task