Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FanOutQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop ReasoningFanOutQA
F1 Score71.84
15
Question AnsweringFanOutQA
F1 Score44.1
9
Evidence RetrievalFanOutQA
Evidence Coverage Rate61
6
Multi-hop information aggregationFanOutQA
Accuracy29.8
3
Showing 4 of 4 rows