Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context language modeling on LongBench 4-task average

12.7Average Accuracy

2d hetero

3.7566.0788.410.722Apr 20, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
12.7582
2026.04
12.1283
2026.04
12136
2026.04
11.51,175
2026.04
11.51,859
2026.04
7.7448
2026.04
7.5224
2026.04
7.31,175
2026.04
6.9224
2026.04
6.8112
2026.04
6.5582
2026.04
6.156
2026.04
6448
2026.04
5.9139
2026.04
5.8283
2026.04
4.956
2026.04
4.1112