Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Locomo

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-term memory evaluationLoCoMo
Overall F158
70
Multi-hop Question AnsweringLoCoMo
F148.35
67
Long-context Question AnsweringLoCoMo
Average F157.32
64
Long-context Memory RetrievalLoCoMo
Single-hop97.1
55
Single-hop Question AnsweringLoCoMo
F10.6408
53
Open-domain Question AnsweringLoCoMo
F10.4013
53
Long-context reasoning and retrievalLoCoMo (test)
Single-Hop F195.12
37
Temporal Question AnsweringLoCoMo
F10.6634
36
Long-form DialogueLoCoMo
EM37.24
32
MemoryLoCoMo
Accuracy30.18
25
MemoryLoCoMo
Execution Time (min)21.7
25
Long-context ReasoningLoCoMo
Average F144.94
25
Open-Domain Question AnsweringLoCoMo Open-Domain (test)
F1 Score15.12
24
Temporal Question AnsweringLoCoMo Temporal (test)
F1 Score44.09
24
Single-Hop Question AnsweringLoCoMo Single-Hop (test)
F137.9
24
Multi-Hop Question AnsweringLoCoMo Multi-Hop (test)
F1 Score26.55
24
Long-context Question AnsweringLoCoMo
Single-Hop LLJ Score97.1
24
Question AnsweringLoCoMo
Single Hop F167.13
22
Long-horizon Question AnsweringLoCoMo
Multi-Hop RGE-L0.2568
20
Long-horizon Question AnsweringLoCoMo Overall All Categories 1.0
EM Rank4.63
20
Long-horizon Question AnsweringLoCoMo Single-Hop 1.0
EM16.77
20
Long-horizon Question AnsweringLoCoMo Open-Domain 1.0
EM7.29
20
Long-horizon Question AnsweringLoCoMo Temporal 1.0
EM1,121
20
Long-horizon Question AnsweringLoCoMo Multi-Hop 1.0
EM426
20
Conversational Question AnsweringLoCoMo Overall
Avg Rank (F1)1
20
Showing 25 of 64 rows