Multi-agent Reasoning on Reasoning Benchmarks Competitive MAD framework (test)

0.8509Average Score

MARSHAL (Generalist, 8B)

Updated 4mo ago

Evaluation Results

Method	Links
MARSHAL (Generalist, 8B) 2025.10		0.8509	0.964	0.9659	0.8346	0.8	0.95	0.907	0.5354
Qwen3-8B 2025.10		0.8249	0.95	0.9636	0.8346	0.7	0.9	0.8959	0.5303