Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Python Coding on HumanEval v1 (test)
Loading...
56.1
Pass@1
InternLM2-7B
12.94
24.145
35.35
46.555
Mar 26, 2024
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
InternLM2-7B
shot count=4-shot, Par...
2024.03
56.1
InternLM2-20B
shot count=4-shot, Par...
2024.03
48.8
ChatGLM3-6B-Base
shot count=4-shot, Par...
2024.03
45.1
Qwen-7B-Chat
shot count=4-shot, Par...
2024.03
36
Mixtral-8x7B-v0.1
shot count=4-shot, Par...
2024.03
32.3
InternLM2-20B-Base
shot count=4-shot, Par...
2024.03
32.3
InternLM2-7B-Base
shot count=4-shot, Par...
2024.03
31.1
Qwen-14B
shot count=4-shot, Par...
2024.03
30.5
Mistral-7B-v0.1
shot count=4-shot, Par...
2024.03
27.4
Baichuan2-13B-Base
shot count=4-shot, Par...
2024.03
23.2
Baichuan2-7B-Base
shot count=4-shot, Par...
2024.03
22
Llama2-13B
shot count=4-shot, Par...
2024.03
18.9
Llama2-7B
shot count=4-shot, Par...
2024.03
14.6
Feedback
Search any
task
Search any
task