Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Performance Estimation on Open LLM Leaderboard subset-selection

1.4MAE

Random

1.1322.9414.756.559Oct 30, 2025
Updated 15d ago

Evaluation Results

MethodLinks
2025.10
1.4
2025.10
1.4
2025.10
1.5
2025.10
1.5
2025.10
1.9
2025.10
2
2025.10
2.1
2025.10
2.3
2025.10
2.4
2025.10
2.5
2025.10
2.7
2025.10
2.7
2025.10
2.8
2025.10
2.8
2025.10
3
2025.10
3.2
2025.10
3.5
2025.10
3.6
2025.10
3.8
2025.10
4.2
2025.10
6.2
2025.10
7.6
2025.10
7.7
2025.10
8.1