Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MagicEval-Plan

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hierarchical Task DecompositionMagicEval-Plan Context Inheritance 3
Step Score97.6
14
Hierarchical Task DecompositionMagicEval-Plan Condition 3
Step Count97.5
14
Hierarchical Task DecompositionMagicEval-Plan Dependency 3
Step Count16
14
Hierarchical Task DecompositionMagicEval-Plan General 3
Step Count35.1
14
Showing 4 of 4 rows