Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MagicEval-Plan

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hierarchical Task DecompositionMagicEval-Plan Context Inheritance 3
Step Score97.6
14
Hierarchical Task DecompositionMagicEval-Plan Condition 3
Step Count97.5
14
Hierarchical Task DecompositionMagicEval-Plan Dependency 3
Step Count16
14
Hierarchical Task DecompositionMagicEval-Plan General 3
Step Count35.1
14
Showing 4 of 4 rows