Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TODAY

Benchmarks

Task NameDataset NameSOTA ResultTrend
Binary decisionTODAY
Accuracy75.8
27
Reasoning/PlanningToday
Accuracy81.7
10
Structural ReasoningTODAY
Accuracy68
9
Showing 3 of 3 rows