Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Python Coding on HumanEval-X v1 (test)
Loading...
48.2
Pass@1
InternLM2-20B
9.72
19.71
29.7
39.69
Mar 26, 2024
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
InternLM2-20B
shot count=5-shot, Par...
2024.03
48.2
InternLM2-7B
shot count=5-shot, Par...
2024.03
39.6
ChatGLM3-6B-Base
shot count=5-shot, Par...
2024.03
38.3
Mixtral-8x7B-v0.1
shot count=5-shot, Par...
2024.03
38.3
InternLM2-20B-Base
shot count=5-shot, Par...
2024.03
31.5
Qwen-14B
shot count=5-shot, Par...
2024.03
31
InternLM2-7B-Base
shot count=5-shot, Par...
2024.03
28.8
Mistral-7B-v0.1
shot count=5-shot, Par...
2024.03
28.5
Qwen-7B-Chat
shot count=5-shot, Par...
2024.03
24.4
Baichuan2-13B-Base
shot count=5-shot, Par...
2024.03
19.5
Llama2-13B
shot count=5-shot, Par...
2024.03
17.2
Baichuan2-7B-Base
shot count=5-shot, Par...
2024.03
16.1
Llama2-7B
shot count=5-shot, Par...
2024.03
11.2
Feedback
Search any
task
Search any
task