Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Machine Learning Engineering on MLE-Bench 51 tasks (held-out)
Loading...
58.5
Avg@3
MLE-IDEATOR
23.868
32.859
41.85
50.841
Jan 24, 2026
Avg@3
Best@3
Updated 3mo ago
Evaluation Results
Method
Method
Links
Avg@3
Best@3
MLE-IDEATOR
Implementer=Claude Son...
2026.01
58.5
60.9
MLE-IDEATOR-RL
Implementer=Claude Son...
2026.01
58.4
63.1
MLE-IDEATOR
Implementer=Claude Son...
2026.01
53.2
56.6
AIDE
Implementer=Sonnet 3.5
2026.01
50.7
53.8
CodeAct
Implementer=Claude Son...
2026.01
50.6
52.8
AIDE
Implementer=GPT-4o
2026.01
49.6
50.7
CodeAct
Implementer=GPT-4o
2026.01
47.9
51.7
MLE-IDEATOR-RL
Implementer=Qwen3-8B,...
2026.01
29.8
30.1
MLE-IDEATOR
Implementer=Qwen3-8B,...
2026.01
28
28.3
CodeAct
Implementer=Qwen3-8B
2026.01
25.4
25.9
MLE-IDEATOR
Implementer=Qwen3-8B,...
2026.01
25.2
25.6
Feedback
Search any
task
Search any
task