Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Python Coding on HumanEval-X (test)
Loading...
43.9
Accuracy
InternLM2-Chat-7B-SFT
9.268
18.259
27.25
36.241
Mar 26, 2024
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
InternLM2-Chat-7B-SFT
Shots=5-shot, Model Si...
2024.03
43.9
InternLM2-Chat-20B-SFT
Shots=5-shot, Model Si...
2024.03
42
InternLM2-Chat-7B
Shots=5-shot, Model Si...
2024.03
41.7
InternLM2-Chat-20B
Shots=5-shot, Model Si...
2024.03
39.8
Mixtral-8x7B-Instruct-v0.1
Shots=5-shot, Model Si...
2024.03
38.3
Qwen-14B-Chat
Shots=5-shot, Model Si...
2024.03
29.9
Mistral-7B-Instruct-v0.2
Shots=5-shot, Model Si...
2024.03
27.1
Qwen-7B-Chat
Shots=5-shot, Model Si...
2024.03
24.4
Baichuan2-13B-Chat
Shots=5-shot, Model Si...
2024.03
18.3
ChatGLM3-6B
Shots=5-shot, Model Si...
2024.03
17.6
Baichuan2-7B-Chat
Shots=5-shot, Model Si...
2024.03
15.4
Llama2-13B-Chat
Shots=5-shot, Model Si...
2024.03
12.9
Llama2-7B-Chat
Shots=5-shot, Model Si...
2024.03
10.6
Feedback
Search any
task
Search any
task