Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Language Understanding on MMLU Redux (test)
Loading...
66.9
Accuracy
Ours (theory-guided context selection strategy)
63.676
64.513
65.35
66.187
Feb 9, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Ours (theory-guided context selection strategy)
Model=Qwen3-8B
2026.02
66.9
ReMem
Model=Qwen3-8B
2026.02
66.8
ExpRAG
Model=Qwen3-8B
2026.02
66.6
DC
Model=Qwen3-8B
2026.02
66.5
BM25
Model=Qwen3-8B
2026.02
66
Zero
Model=Qwen3-8B
2026.02
65.8
Ours (theory-guided context selection strategy)
Model=Llama-3.1-8B
2026.02
65
ReMem
Model=Llama-3.1-8B
2026.02
64.9
ExpRAG
Model=Llama-3.1-8B
2026.02
64.7
DC
Model=Llama-3.1-8B
2026.02
64.6
BM25
Model=Llama-3.1-8B
2026.02
64.1
Zero
Model=Llama-3.1-8B
2026.02
63.8
Feedback
Search any
task
Search any
task