Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General AI Assistant Reasoning on GAIA-Text-103 1.0 (test)
Loading...
76.9
L1 Accuracy
Claude-3.7-Sonnet
34.26
45.33
56.4
67.47
Feb 3, 2026
L1 Accuracy
L2 Accuracy
L3 Accuracy
Overall Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
L1 Accuracy
L2 Accuracy
L3 Accuracy
Overall Accuracy
Claude-3.7-Sonnet
Model Type=Proprietary
2026.02
76.9
57.7
33.3
62.1
CSO
Base Model=CK-Pro-8B,...
2026.02
61.5
48.1
16.7
49.5
GPT-4.1
Model Type=Proprietary
2026.02
56.4
44.2
16.7
45.6
IPR
Base Model=CK-Pro-8B,...
2026.02
56.4
42.3
16.7
44.6
Step-DPO
Base Model=CK-Pro-8B,...
2026.02
53.3
34.6
8.3
38.9
ETO
Base Model=CK-Pro-8B,...
2026.02
51.2
36.5
8.3
38.9
RFT
Base Model=CK-Pro-8B,...
2026.02
51.2
28.8
8.3
34.9
CK-Pro-8B
Mode=SFT
2026.02
46.2
34.6
8.3
35.9
Qwen3-8B
Model Type=Open-Source...
2026.02
35.9
13.5
0
20.4
Feedback
Search any
task
Search any
task