Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-agent Reasoning on Reasoning Benchmarks Competitive MAD framework (test)

0.8509Average Score

MARSHAL (Generalist, 8B)

0.823860.830880.83790.84492Oct 17, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.10
0.85090.9640.96590.83460.80.950.9070.5354
2025.10
0.82490.950.96360.83460.70.90.89590.5303