Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Model Evaluation on Open LeaderBoard v2
Loading...
0.6547
AS
AgentSociety
0.6157
0.625825
0.63595
0.646075
May 25, 2026
AS
BS
Delta PP
Oracle
Updated 7d ago
Evaluation Results
Method
Method
Links
AS
BS
Delta PP
Oracle
AgentSociety
Agents=6, Domains=15
2026.05
0.6547
0.5714
8.33
0.6565
AgentSociety
Agents=6, Domains=15,...
2026.05
0.6547
0.5714
8.33
0.6565
AgentSociety
Agents=6, Domains=15,...
2026.05
0.651
0.5714
7.96
0.6565
AgentSociety
Agents=6, Domains=15,...
2026.05
0.6393
0.5714
6.79
0.6565
AgentSociety
Agents=6, Domains=15,...
2026.05
0.6172
0.5714
4.58
0.6565
Feedback
Search any
task
Search any
task