Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Agent Performance on HELD-OUT Suite

52.1HotpotQA Score

GPT-4

21.10829.15437.245.246Mar 19, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.03
52.136.46.2886.494.255.1
2024.03
37.421.24.568492.147.8
2024.03
28.5204.686689.141.7
2024.03
26.26.80.259.340.416.6
2024.03
25.416.82.7161.884.538.2
2024.03
22.65.91.227.478.727.2
2024.03
22.313.70.7441.480.631.7