Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Procedural Planning on Macro Average In-domain

56.3Macro Accuracy

GPT-4o-mini

38.51643.13347.7552.367May 19, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.05
56.3
2026.05
55.5
2026.05
46.6
2026.05
39.2