Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General AI Assistant Reasoning on GAIA-Text-103 1.0 (test)

62.1Overall Accuracy

Claude-3.7-Sonnet

15.71627.75839.851.842Aug 1, 2025Sep 1, 2025Oct 2, 2025Nov 2, 2025Dec 3, 2025Jan 3, 2026Feb 3, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
62.176.957.733.3
2025.08
60.169.263.416.6
2025.08
55.4---
2025.08
53.369.25016.6
2025.08
53.2---
2025.08
51.561.55025
2026.02
49.561.548.116.7
2025.08
49.361.544.216.7
2025.08
48.556.45016.7
2026.02
45.656.444.216.7
2025.08
44.753.844.216.7
2026.02
44.656.442.316.7
2025.08
43.756.442.38.33
2025.08
41.153.834.616.7
2025.08
40.746.144.28.3
2026.02
38.951.236.58.3
2026.02
38.953.334.68.3
2025.08
37.9---
2026.02
35.946.234.68.3
2026.02
34.951.228.88.3
2025.08
34---
2025.08
314130.70
2025.08
28.233.3250
2026.02
20.435.913.50
2025.08
20.428.219.28.3
2025.08
17.523.117.30