Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GDPVal

Benchmarks

Task NameDataset NameSOTA ResultTrend
Task PerformanceGDPVal 44 tasks (held-out)
Mean Return0.81
8
AgenticGDPval-AA Elo
Elo Score1,462
7
Alignment ranking evaluationGDPVal 44 tasks
Mean NDCG@80.8722
3
Showing 3 of 3 rows