Share your thoughts, 1 month free Claude Pro on usSee more

General AI Assistant Reasoning on GAIA-Text-103 1.0 (test)

62.1Overall Accuracy

Claude-3.7-Sonnet

Updated 3mo ago

Evaluation Results

Method	Links
Claude-3.7-Sonnet 2026.02		62.1	76.9	57.7	33.3
WebShaper 2025.08		60.1	69.2	63.4	16.6
WebSailor 2025.08		55.4	-	-	-
WebShaper 2025.08		53.3	69.2	50	16.6
WebSailor 2025.08		53.2	-	-	-
WebDancer 2025.08		51.5	61.5	50	25
CSO 2026.02		49.5	61.5	48.1	16.7
Cognitive Kernel-Pro 2025.08		49.3	61.5	44.2	16.7
WebThinker-RL 2025.08		48.5	56.4	50	16.7
GPT-4.1 2026.02		45.6	56.4	44.2	16.7
WebThinker-Base 2025.08		44.7	53.8	44.2	16.7
IPR 2026.02		44.6	56.4	42.3	16.7
Cognitive Kernel-Pro 2025.08		43.7	56.4	42.3	8.33
Cognitive Kernel-Pro 2025.08		41.1	53.8	34.6	16.7
WebDancer 2025.08		40.7	46.1	44.2	8.3
ETO 2026.02		38.9	51.2	36.5	8.3
Step-DPO 2026.02		38.9	53.3	34.6	8.3
WebSailor 2025.08		37.9	-	-	-
CK-Pro-8B 2026.02		35.9	46.2	34.6	8.3
RFT 2026.02		34.9	51.2	28.8	8.3
WebDancer 2025.08		34	-	-	-
WebDancer 2025.08		31	41	30.7	0
Search-o1 2025.08		28.2	33.3	25	0
Qwen3-8B 2026.02		20.4	35.9	13.5	0
R1-Searcher 2025.08		20.4	28.2	19.2	8.3
Search-o1 2025.08		17.5	23.1	17.3	0