Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General AI Assistant Reasoning on GAIA-Text-103 1.0 (test)
Loading...
76.9
L1 Accuracy
Claude-3.7-Sonnet
34.26
45.33
56.4
67.47
Feb 3, 2026
L1 Accuracy
L2 Accuracy
L3 Accuracy
Overall Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
L1 Accuracy
L2 Accuracy
L3 Accuracy
Overall Accuracy
Claude-3.7-Sonnet
Model Type=Proprietary
2026.02
76.9
57.7
33.3
62.1
CSO
Base Model=CK-Pro-8B,...
2026.02
61.5
48.1
16.7
49.5
GPT-4.1
Model Type=Proprietary
2026.02
56.4
44.2
16.7
45.6
IPR
Base Model=CK-Pro-8B,...
2026.02
56.4
42.3
16.7
44.6
Step-DPO
Base Model=CK-Pro-8B,...
2026.02
53.3
34.6
8.3
38.9
ETO
Base Model=CK-Pro-8B,...
2026.02
51.2
36.5
8.3
38.9
RFT
Base Model=CK-Pro-8B,...
2026.02
51.2
28.8
8.3
34.9
CK-Pro-8B
Mode=SFT
2026.02
46.2
34.6
8.3
35.9
Qwen3-8B
Model Type=Open-Source...
2026.02
35.9
13.5
0
20.4
Feedback
Search any
task
Search any
task