Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Knowledge Evaluation on C-Eval (val)
Loading...
34.32
Accuracy
LLaMA2-13B
22.4224
25.5112
28.6
31.6888
Jul 23, 2024
Nov 6, 2024
Feb 20, 2025
Jun 7, 2025
Sep 21, 2025
Jan 5, 2026
Apr 22, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
LLaMA2-13B
Model Size=13B, Shots=...
2024.07
34.32
TinyLLaMA-1.1B + DDK
Model Size=1.1B, Shots...
2024.07
27.86
TinyLLaMA-1.1B + TED
Model Size=1.1B, Shots...
2024.07
27.49
TinyLLaMA-1.1B + KD
Model Size=1.1B, Shots...
2024.07
27.12
TinyLLaMA-1.1B + CPT
Model Size=1.1B, Shots...
2024.07
26.79
TinyLLaMA-1.1B + MiniLLM
Model Size=1.1B, Shots...
2024.07
26.74
TinyLLaMA-1.1B + CPT & DoReMi
Model Size=1.1B, Shots...
2024.07
26.37
ML
Filtering Strategy=ML
2026.04
26.15
MKC-e
Filtering Strategy=MKC-e
2026.04
25.56
HQ
Filtering Strategy=HQ
2026.04
24.89
ML (Q3)
Filtering Strategy=ML...
2026.04
24.67
HQ seed
Filtering Strategy=HQ...
2026.04
24.44
TinyLLaMA-1.1B
Model Size=1.1B, Shots...
2024.07
23.92
ML (15%)
Filtering Strategy=ML...
2026.04
23.7
No filtering
Filtering Strategy=No...
2026.04
22.88
Feedback
Search any
task
Search any
task