Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HotPot

Benchmarks

Task NameDataset NameSOTA ResultTrend
AbstentionHotpot (test)
AUARC60.9
25
Question AnsweringADVHOTPOT
Accuracy82.4
12
Selective Question AnsweringHOTPOT
Area under Coverage-F192.5
12
Retrieval Question AnsweringHotPot
MRR47.7
6
Information RetrievalHotpot BEIR
nDCG0.687
5
Multi-hop Question AnsweringHotpot Kimi
EM54.07
4
Retrieval Question AnsweringHotPot (in-domain)
MRR63.8
4
Error Detection and RecoveryHotpot Robot Data (test)
Recovery Success Ratio5
3
Human-Robot InteractionHotpot
Successes10
2
Showing 9 of 9 rows