Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GDPVal

Benchmarks

Task NameDataset NameSOTA ResultTrend
Task PerformanceGDPVal 44 tasks (held-out)
Mean Return0.81
8
AgenticGDPval-AA Elo
Elo Score1,462
7
Alignment ranking evaluationGDPVal 44 tasks
Mean NDCG@80.8722
3
Showing 3 of 3 rows