Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context retrieval and reasoning on RULER 16k context 3k budget
Loading...
100
S-1 Score
LLMLingua
24.08
43.79
63.5
83.21
Mar 20, 2026
S-1 Score
S-2 Score
S-3 Score
Key-1 Score
Key-2 Score
Key-3 Score
Validation Score
Query Score
Verification Task Score (VT)
Frequency Score
QA Score (1)
QA Score (2)
Average Score
Updated 27d ago
Evaluation Results
Method
Method
Links
S-1 Score
S-2 Score
S-3 Score
Key-1 Score
Key-2 Score
Key-3 Score
Validation Score
Query Score
Verification Task Score (VT)
Frequency Score
QA Score (1)
QA Score (2)
Average Score
LLMLingua
Avg Tokens=3,230
2026.03
100
5
4
7
6
12
6
4.8
59.4
90.7
15
37
28.9
LongLLMLingua
Avg Tokens=3,179
2026.03
100
6
4
9
2
3
7.8
7.3
60.4
91.3
19
36
28.8
BEAVER
Avg Tokens=3,198
2026.03
100
100
100
99
88
41
99.8
99.8
78.2
93.7
50
55
83.7
LLMLingua-2
Avg Tokens=3,132
2026.03
27
86
27
72
11
2
80.3
82.3
3.4
95.3
45
43
47.9
Feedback
Search any
task
Search any
task