Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-subject Knowledge on MMLU
Loading...
79.7
Accuracy
Baseline
43.82
53.135
62.45
71.765
Dec 10, 2025
Jan 4, 2026
Jan 30, 2026
Feb 24, 2026
Mar 22, 2026
Apr 16, 2026
May 12, 2026
Accuracy
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
Baseline
Model=Qwen3-8B
2026.05
79.7
ScaleSearch
Model=Qwen3-8B
2026.05
79.4
NVFP4
Model=Qwen3-8B
2026.05
77.7
TCA-Attention
Base Model=Qwen2.5-7B-...
2025.12
74.26
FlexPrefill
Base Model=Qwen2.5-7B-...
2025.12
74.23
Qwen2.5-7B-Instruct
Attention Method=Full...
2025.12
74.22
XAttention
Base Model=Qwen2.5-7B-...
2025.12
74.2
MInference
Base Model=Qwen2.5-7B-...
2025.12
74.14
LLaMA3.1-8B-Instruct
Attention Method=Full...
2025.12
69.38
XAttention
Base Model=LLaMA3.1-8B...
2025.12
69.21
TCA-Attention
Base Model=LLaMA3.1-8B...
2025.12
69.21
FlexPrefill
Base Model=LLaMA3.1-8B...
2025.12
69.16
MInference
Base Model=LLaMA3.1-8B...
2025.12
69.14
Baseline
Model=DeepSeek-R1-Dist...
2026.05
48
ScaleSearch
Model=DeepSeek-R1-Dist...
2026.05
45.4
NVFP4
Model=DeepSeek-R1-Dist...
2026.05
45.2
Feedback
Search any
task
Search any
task