Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LooGLE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long Dependency Question AnsweringLooGLE
Retrieval40
21
Single-hop Question AnsweringLoogle SD
Score45.1
17
Question AnsweringLooGLE Long Dependency QA
BLEU-10.0942
12
SummarizationLooGLE ArXiv Paper Summarization
BLEU-129.15
11
ReasoningLooGLE
Reasoning Accuracy57
10
Question AnsweringLooGLE
QA Accuracy27
10
Long-Context Question AnsweringLooGLE
EM66.3
6
Question AnsweringLooGLE
Short QA Score86.02
5
Multi-hop Question AnsweringLooGLE CR 16k
Score19.78
5
Multi-hop Question AnsweringLooGLE-MR 16k
Score15.1
5
Single-hop Question AnsweringLooGLE-SD 16k
Score45.1
5
Long-context question-answeringLooGLE (test)
ShortQA Score54.65
2
Showing 12 of 12 rows