Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

InfiniteBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context language understandingInfiniteBench
En.Sum33.01
88
Long-context understandingInfiniteBench v1 (test)
Dialogue20
31
Long-context code reasoningInfiniteBench Code-Debug (test)
Accuracy86
25
Long-context understandingInfiniteBench
Math Score (F)0.5
25
Code DebuggingInfiniteBench code_debug 40k input cap
Accuracy34.26
19
Long-context language modelingInfiniteBench (test)
En QA Score34.82
14
Long-context ModelingInfiniteBench
Decoding Speedup9
13
Long-context reasoningInfiniteBench (test)
Average Score50.18
12
Key-Value RetrievalInfiniteBench 8k
Accuracy96
12
Key-Value RetrievalInfiniteBench 4k
Accuracy100
12
Long-context language modelingInfiniteBench
En. Sum Accuracy18
10
Key-Value RetrievalInfiniteBench 16k
Accuracy (%)87
10
Code DebugInfiniteBench Code Debug
Accuracy74.37
7
Long-context understandingInfiniteBench (test)
En QA F136.7
6
Long context understandingInfiniteBench En.MC
Accuracy83.4
5
Long-context language understandingInfiniteBench
InfiniteBench QA (EN) Score7.84
4
Math FindInfiniteBench
Performance (8k Context)37.14
3
KVInfiniteBench
KV Retrieval Score (8k)6.2
3
Long-context retrieval and reasoningInfiniteBench
Retrieval PassKey100
2
Showing 19 of 19 rows