Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CL-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Task-solvingCL-bench (test)
Overall Score (%)25.8
16
Context LearningCL-Bench (test)
Overall Score12.85
8
Agentic Long-context ReasoningCL-bench (test)
Solve Rate26
6
Context Learning Task-SolvingCL-Bench
Overall Score15.8
5
Long Context & Context LearningCL-Bench
Pass@115.5
3
Showing 5 of 5 rows