Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Classification on OOLONG (test)
Loading...
58.1
TREC-Q-coarse Accuracy
RLM + PEEK
29.188
36.694
44.2
51.706
May 19, 2026
TREC-Q-coarse Accuracy
AGNews Accuracy
Yahoo Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
TREC-Q-coarse Accuracy
AGNews Accuracy
Yahoo Accuracy
RLM + PEEK
Base LM=GPT-5-mini-202...
2026.05
58.1
69.4
57
RLM + ACE (Online Adaptation)
Base LM=GPT-5-mini-202...
2026.05
48.8
61.6
42
RLM + Compaction Agent
Base LM=GPT-5-mini-202...
2026.05
42
49.5
30
RLM + RAG
Base LM=GPT-5-mini-202...
2026.05
36.6
63.1
29
RLM + Shared Chat
Base LM=GPT-5-mini-202...
2026.05
32
49.6
23
RLM
Base LM=GPT-5-mini-202...
2026.05
30.3
46.5
23
Feedback
Search any
task
Search any
task