Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context capability evaluation on RULER 4096 length
Loading...
94.08
Accuracy
Qwen3-30B A3B-Instruct
76.7328
81.2364
85.74
90.2436
Feb 9, 2026
Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-30B A3B-Instruct
Budget=Full
2026.02
94.08
Qwen3-4B
Budget=Full
2026.02
93.32
QUOKA
Model=Qwen3-30B A3B-In...
2026.02
93.25
QUOKA
Model=Qwen3-4B, Budget...
2026.02
92.5
Smollm3
Budget=Full
2026.02
91.12
QUOKA
Model=Smollm3, Budget=25%
2026.02
89.6
Qwen2.5-3B
Budget=Full
2026.02
89.56
Llama3.2-3B
Budget=Full
2026.02
87.5
QUOKA
Model=Llama3.2-3B, Bud...
2026.02
86.94
QUOKA
Model=Qwen2.5-3B, Budg...
2026.02
86.07
GPT-OSS-20B
Budget=Full
2026.02
79.35
QUOKA
Model=GPT-OSS-20B, Bud...
2026.02
77.4
Feedback
Search any
task
Search any
task