Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Factuality Evaluation on MMLU
Loading...
82.4
EM
GPT-4
29.464
43.207
56.95
70.693
Jun 7, 2023
EM
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
GPT-4
Type=Proprietary, Shot...
2023.06
82.4
ChatGPT
Type=Proprietary, Shot...
2023.06
67.9
ShareGPT 65B
Base Model=LLaMA 65B,...
2023.06
61.3
Human mix. 65B
Base Model=LLaMA 65B,...
2023.06
60.4
TÜLU 65B
Base Model=LLaMA, Trai...
2023.06
59.2
LLaMa 65B
Base Model=LLaMA, Shot...
2023.06
58.7
TÜLU 30B
Base Model=LLaMA, Trai...
2023.06
57.7
LLaMa 30B
Base Model=LLaMA, Shot...
2023.06
54.6
TÜLU-1.1 13B
Base Model=LLaMA-2, Tr...
2023.06
52.3
LLaMa-2 13B
Base Model=LLaMA-2, Sh...
2023.06
52
TÜLU 13B
Base Model=LLaMA, Trai...
2023.06
49.3
TÜLU-1.1 7B
Base Model=LLaMA-2, Tr...
2023.06
49.2
TÜLU 7B
Base Model=LLaMA, Trai...
2023.06
44.8
LLaMa 13B
Base Model=LLaMA, Shot...
2023.06
42.3
LLaMa-2 7B
Base Model=LLaMA-2, Sh...
2023.06
41.8
LLaMa 7B
Base Model=LLaMA, Shot...
2023.06
31.5
Feedback
Search any
task
Search any
task