Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Python Coding on HumanEval (test)
Loading...
67.7
Accuracy
InternLM2-Chat-20B
6.132
22.116
38.1
54.084
Mar 26, 2024
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
InternLM2-Chat-20B
Shots=4-shot, Model Si...
2024.03
67.7
InternLM2-Chat-20B-SFT
Shots=4-shot, Model Si...
2024.03
67.1
InternLM2-Chat-7B-SFT
Shots=4-shot, Model Si...
2024.03
61.6
InternLM2-Chat-7B
Shots=4-shot, Model Si...
2024.03
59.2
ChatGLM3-6B
Shots=4-shot, Model Si...
2024.03
53.1
Qwen-14B-Chat
Shots=4-shot, Model Si...
2024.03
41.5
Qwen-7B-Chat
Shots=4-shot, Model Si...
2024.03
36
Mistral-7B-Instruct-v0.2
Shots=4-shot, Model Si...
2024.03
35.4
Mixtral-8x7B-Instruct-v0.1
Shots=4-shot, Model Si...
2024.03
32.3
Baichuan2-13B-Chat
Shots=4-shot, Model Si...
2024.03
19.5
Baichuan2-7B-Chat
Shots=4-shot, Model Si...
2024.03
17.7
Llama2-7B-Chat
Shots=4-shot, Model Si...
2024.03
15.2
Llama2-13B-Chat
Shots=4-shot, Model Si...
2024.03
8.5
Feedback
Search any
task
Search any
task