Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-Context

Benchmarks

Task NameDataset NameSOTA ResultTrend
Many-shot in-context learningLong-context benchmarks
ICL Performance (8k Context)74.2
21
Context ManagementLong-context (test)
mTokens1
19
Long ContextLong Context benchmark
Accuracy67.59
14
Tabular LearningLong-context 15 datasets v2 (test)
Avg. Normalized RMSE0.523
9
Average across tasksLong-context benchmarks
Performance (8k Context)45.9
8
End-to-end LLM Inference ServingLong-context 1024-token input, 32-token output
TPOT Speedup vs DeepGEMM1.48
3
Long-Context TrainingLong-Context (train)
Metric-
0
Showing 7 of 7 rows