Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on GAIA text

69.9Average Accuracy

WebAggregator

4.27621.31338.3555.387Oct 16, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
69.979.567.350--
2025.10
69.979.567.350--
2025.10
63.174.463.525--
2025.10
62.182.153.833.3--
2025.10
60.271.857.733.3--
2025.10
60.274.455.833.3--
2025.10
56.369.255.816.7--
2025.10
53.2-----
2025.10
52.261.553.816.7--
2025.10
51.566.744.233.3--
2025.10
48.556.45016.7--
2025.10
43.751.344.216.7--
2025.10
43.756.442.38.3--
2025.10
42.761.534.616.7--
2025.10
40.848.740.416.7--
2025.10
40.853.830.816.7--
2025.10
40.746.144.28.3--
2025.10
37.9-----
2025.10
314130.70--
2025.10
28.246.121.20--
2025.10
22.335.917.30--
2025.10
18.433.311.50--
2025.10
16.523.115.40--
2025.10
13.620.59.68.3--
2025.10
11.710.313.58.3--
2025.10
8.717.93.80--
2025.10
6.812.83.80--
2025.10
6.812.83.80--