Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Performance Estimation on HLE (Humanity's Last Exam) 2% subset

2.9MAE

Scales++

2.883.0153.153.285Oct 30, 2025
Updated 15d ago

Evaluation Results

MethodLinks
2025.10
2.90.3
2025.10
3.20.4
2025.10
3.40.2