Share your thoughts, 1 month free Claude Pro on usSee more

Python Coding on HumanEval-X (test)

43.9Accuracy

InternLM2-Chat-7B-SFT

Updated 3mo ago

Evaluation Results

Method	Links
InternLM2-Chat-7B-SFT 2024.03		43.9
InternLM2-Chat-20B-SFT 2024.03		42
InternLM2-Chat-7B 2024.03		41.7
InternLM2-Chat-20B 2024.03		39.8
Mixtral-8x7B-Instruct-v0.1 2024.03		38.3
Qwen-14B-Chat 2024.03		29.9
Mistral-7B-Instruct-v0.2 2024.03		27.1
Qwen-7B-Chat 2024.03		24.4
Baichuan2-13B-Chat 2024.03		18.3
ChatGLM3-6B 2024.03		17.6
Baichuan2-7B-Chat 2024.03		15.4
Llama2-13B-Chat 2024.03		12.9
Llama2-7B-Chat 2024.03		10.6