Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Babilong

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringBabilong 16k context length
QA1 Accuracy58
9
Long-context reasoningBABILong
Err (2k Context)14.1
6
Question AnsweringBabilong 128k context length
QA1 Score38
5
Question AnsweringBabilong 64k context length
QA1 Score25
5
Showing 4 of 4 rows