Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

InfiniteBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context language understandingInfiniteBench
En.Sum33.01
81
Long-context understandingInfiniteBench v1 (test)
Dialogue20
31
Long-context understandingInfiniteBench
Math Score (F)0.4771
22
Long-context language modelingInfiniteBench (test)
En QA Score34.82
14
Key-Value RetrievalInfiniteBench 8k
Accuracy96
12
Key-Value RetrievalInfiniteBench 4k
Accuracy100
12
Key-Value RetrievalInfiniteBench 16k
Accuracy (%)87
10
Code DebugInfiniteBench Code Debug
Accuracy74.37
7
Long-context reasoningInfiniteBench (test)
Reasoning Pa Score87.63
6
Long-context understandingInfiniteBench (test)
En QA F136.7
6
Long context understandingInfiniteBench En.MC
Accuracy83.4
5
Long-context language understandingInfiniteBench
InfiniteBench QA (EN) Score7.84
4
Math FindInfiniteBench
Performance (8k Context)37.14
3
KVInfiniteBench
KV Retrieval Score (8k)6.2
3
Long-context ModelingInfiniteBench
Decoding Speedup9
1
Showing 15 of 15 rows