Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Long-context Reasoning on CL-bench (test)
Loading...
26
Solve Rate
RLM + PEEK
11.44
15.22
19
22.78
May 19, 2026
Solve Rate
Rubric Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
Solve Rate
Rubric Accuracy
RLM + PEEK
Base LM=GPT-5-mini-202...
2026.05
26
63.4
RLM + Compaction Agent
Base LM=GPT-5-mini-202...
2026.05
20
54.6
RLM + ACE (Online Adaptation)
Base LM=GPT-5-mini-202...
2026.05
20
53.5
RLM
Base LM=GPT-5-mini-202...
2026.05
14
54.5
RLM + RAG
Base LM=GPT-5-mini-202...
2026.05
14
55.6
RLM + Shared Chat
Base LM=GPT-5-mini-202...
2026.05
12
51.3
Feedback
Search any
task
Search any
task