Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Model Evaluation on MMLU-Pro
Loading...
77.54
AS
AgentSociety
73.663
75.6015
77.54
79.4785
May 25, 2026
AS
BS
Delta (pp)
Oracle Score
Updated 7d ago
Evaluation Results
Method
Method
Links
AS
BS
Delta (pp)
Oracle Score
AgentSociety
Agents=6, Domains=14
2026.05
77.54
75.68
1.86
78.21
Feedback
Search any
task
Search any
task