Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Autonomous Machine Learning Engineering on MLE-bench (held-in and held-out)

76.53CIFAR-10 Performance

AIDE (GPT-5)

-2.957217.678938.31558.9511May 29, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
76.5322.158.7777.3831.59.1826.424.510.134.250.1429.690.1120.95
2025.05
73.4318.2512.0714.5614.752.7833.393.0304.010.0429.270.2213.13
2025.05
72.555.3513.0733.2310.254.5430.771.380.310.360.015.780.1411.4
2025.05
71.6422.310.5410.9623.882.4867.076.380.7910.410.3526.490.2519.12
2025.05
61.4612.1512.746645.636.4389.594.3611.26.79023.380.2318.14
2025.05
57.613.016.762.62.1216.3626.680.31.020.80.011.960.0710.93
2025.05
53.5911.139.4462.7284.256.0856.457.350.744.340.0431.920.1311.35
2025.05
33.86.7713.4752.3813.871.4172.891.911.741.760.0112.960.216.4
2025.05
28.963.455.538.834.850.0433.440.050.250.8902.670.136.83
2025.05
16.781.160.137.854.384.2622.3800.130.0400.1305.38
2025.05
11.362.427.527.334.754.330.529.780.380.070.0100.080.77
2025.05
1.370.231.392.12.526.3212.251.230.510.460.063.750.041.43
2025.05
1.0300.12.441.383.991.121.790.260.040.020.0200.1
2025.05
0.12.0411.141.654.752.898.262.370.430.9612.150.5104.38