Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LV-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringLV-Eval (test)
EM14.5
19
Multi-hop Question AnsweringLV-Eval (test)
F1 Score12.9
14
Long-context Question AnsweringLV-Eval
F1 Score14.81
14
Question AnsweringLV-Eval
Average Token Count51,066.2
7
Multi-hop Question AnsweringLV-Eval
Average Running Time (s)1.31
6
RetrievalLV-Eval
Average Running Time (s)0.41
5
Long-context retrieval and reasoningLV-Eval
Performance (16k Context)58.82
5
Long-context language understandingLV-Eval
CMRC (Mixup)7.05
4
Multi-Hop QALV-Eval
EM10.5
3
Showing 9 of 9 rows