Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Compositional Reasoning on WebAggregatorQA

70.8Level-1 Score

WebAggregator

1.53619.51837.555.482Oct 16, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
70.822.219.428.9--
2025.10
66.725.311.128.3--
2025.10
66.735.413.935.2--
2025.10
62.422.211.125.8--
2025.10
62.421.211.125.2--
2025.10
62.424.28.326.4--
2025.10
58.324.522.228.9--
2025.10
54.222.219.426.4--
2025.10
54.215.211.120.1--
2025.10
54.211.15.616.4--
2025.10
45.810.15.614.5--
2025.10
37.511.18.314.5--
2025.10
30.85.15.69.4--
2025.10
27.33.42.86.3--
2025.10
2510.15.611.3--
2025.10
18.55.12.86.8--
2025.10
15.442.85.6--
2025.10
8.3101.9--
2025.10
4.2101.3--
2025.10
4.2101.3--
2025.10
4.212.81.9--