Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-task Language Understanding on CEval
Loading...
44.7
Accuracy
DeepSeek Chat 7B
24.732
29.916
35.1
40.284
Jan 11, 2024
Mar 16, 2024
May 21, 2024
Jul 26, 2024
Sep 29, 2024
Dec 4, 2024
Feb 8, 2025
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
DeepSeek Chat 7B
# Shot=0-shot, Total P...
2024.01
44.7
FRAME
Model Size=3B, Trainin...
2025.02
44
DeepSeekMoE 16B
# Shot=5-shot, # Total...
2024.01
40.6
DeepSeek 67B (Dense)
# Shot=5-shot
2024.01
40.3
DeepSeekMoE Chat 16B
# Shot=0-shot, Total P...
2024.01
40
DeepSeekMoE 145B
# Shot=5-shot
2024.01
37.1
PDPC
Model Size=3B, Trainin...
2025.02
36.1
LLaMA2 SFT 7B
# Shot=0-shot, Total P...
2024.01
35.1
LLaMA2 7B
# Shot=5-shot, # Total...
2024.01
33.9
DeepSeekMoE 142B (Half Activated)
# Shot=5-shot
2024.01
32.8
Random
Model Size=3B, Trainin...
2025.02
27.2
GShard 137B
# Shot=5-shot
2024.01
26.2
Q3 -> Q1 -> Q4 -> Q2
Model Size=3B, Trainin...
2025.02
25.5
Feedback
Search any
task
Search any
task