Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-agent Reasoning on Reasoning Benchmarks Cooperative AutoGen framework (test)

83.58Overall Accuracy

MARSHAL (Generalist, 8B)

79.52480.57781.6382.683Oct 17, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.10
83.5894.49585.04709590.0455.56
2025.10
79.6888.895.9183.076089.1989.351.52