Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Comprehensive Examination on AGIEval (test)

62.3Accuracy

GPT 4o

18.20429.65241.152.548Mar 26, 2024Jun 8, 2024Aug 21, 2024Nov 3, 2024Jan 16, 2025Mar 31, 2025Jun 13, 2025
Updated 15d ago

Evaluation Results

MethodLinks
2024.10
62.3----------
2024.03
53----------
2024.03
52----------
2024.03
51.2----------
2024.03
50.3----------
2024.03
49.9----------
2024.03
49----------
2024.10
47.7----------
2024.03
47.5----------
2024.03
47.2----------
2024.03
46.5----------
2024.10
46.2----------
2024.03
45.6----------
2024.03
44.2----------
2024.03
41.7----------
2024.03
40.9----------
2024.03
40----------
2024.03
39.9----------
2024.03
39.7----------
2024.10
39.3----------
2024.03
38.7----------
2024.03
38.7----------
2024.10
38.4----------
2024.03
37.4----------
2024.03
35.3----------
2024.03
34.5----------
2024.03
34.5----------
2024.10
33.9----------
2024.03
32.9----------
2024.03
30.9----------
2024.03
28.5----------
2024.03
27.9----------
2025.06
22.07----------
2024.03
21.3----------
2025.06
21.02----------
2025.06
20.89----------
2024.10
19.9----------
2026.01
-53.3337.6849.5980.3960.864.2627.1264.13952.92
2026.01
-58.137.6849.1977.7867.8469.3627.1260.9743.554.62
2026.01
-4.716.98.055.253.83.498.799.047.316.37
2026.01
-4.187.776.333.464.153.336.087.577.915.64
2026.01
-11.2515.4110.99.49.799.8619.0919.8515.9113.5
2026.01
-6.1214.464.572.715.594.2617.7220.0815.9310.16