Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context language understanding on RULER 0 shot v1 (test)

94.71CWE Score

Vanilla

84.93487.47290.0192.548Nov 3, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.11
94.7192.111,00057.2255.7259.9
2025.11
94.0191.911,00056.8255.7259.7
2025.11
93.2183.011,00036.2222.4152.8
2025.11
92.4189.011,00057.2255.2259.2
2025.11
92.4186.211,00062.0264.9258.1
2025.11
90.9182.111,00033.8222.9152.2
2025.11
85.3186.311,00062.2264.7257.3