Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
In-Context Retrieval on RULER
Loading...
99.55
MQ Score
Full
55.766
67.133
78.5
89.867
Apr 9, 2026
MQ Score
MV Score
QA-1 Score
QA-2 Score
VT Score
FWE Score
Average Score
Updated 9d ago
Evaluation Results
Method
Method
Links
MQ Score
MV Score
QA-1 Score
QA-2 Score
VT Score
FWE Score
Average Score
Full
Backbone=GLM-4.7-Flash
2026.04
99.55
99.75
53.6
66.25
100
95.2
85.73
AsyncTLS
Backbone=GLM-4.7-Flash...
2026.04
95.1
98.8
48.2
59.5
99.88
87.53
81.5
DS
Backbone=GLM-4.7-Flash...
2026.04
92.9
99.75
49
59.4
100
93.2
82.38
Quest
Backbone=GLM-4.7-Flash...
2026.04
57.45
42.25
30.4
45.8
60.84
83.2
53.32
Feedback
Search any
task
Search any
task