Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context capability evaluation on RULER 8192 length
Loading...
93.75
Accuracy
Qwen3-30B A3B-Instruct
78.7428
82.6389
86.535
90.4311
Feb 9, 2026
Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-30B A3B-Instruct
Budget=Full
2026.02
93.75
QUOKA
Model=Qwen3-30B A3B-In...
2026.02
92.77
Qwen3-4B
Budget=Full
2026.02
91.68
QUOKA
Model=Qwen3-4B, Budget...
2026.02
91.35
Smollm3
Budget=Full
2026.02
83.46
Qwen2.5-3B
Budget=Full
2026.02
81.99
QUOKA
Model=Smollm3, Budget=25%
2026.02
81.45
Llama3.2-3B
Budget=Full
2026.02
81.33
QUOKA
Model=GPT-OSS-20B, Bud...
2026.02
80.66
QUOKA
Model=Qwen2.5-3B, Budg...
2026.02
79.78
QUOKA
Model=Llama3.2-3B, Bud...
2026.02
79.72
GPT-OSS-20B
Budget=Full
2026.02
79.32
Feedback
Search any
task
Search any
task