Share your thoughts, 1 month free Claude Pro on usSee more

Code Generation on HumanEval (Speed, Latency, and Resource Usage)

89.02Accuracy

MetaGPT

Updated 1mo ago

Evaluation Results

Method	Links
MetaGPT 2026.03		89.02	-	-	-
Evo 8B 2026.02		86.7	-	-	-
AgentVerse 2026.03		85	-	-	-
AgentVerse 2026.03		85	-	-	-
Qwen2.5 7B 2026.02		84.8	-	-	-
Vibe Graphing-Task Specific 2026.03		84.76	-	-	-
Vibe Graphing-ChatDev 2026.03		83.5	-	-	-
BD3-LM 7B 2026.02		82.9	-	-	-
ChatDev 2026.03		82.5	-	-	-
HuggingGPT 2026.03		82.32	-	-	-
ChatDev 2026.03		81.3	-	-	-
HuggingGPT 2026.03		80.49	-	-	-
CAMEL 2026.03		71.85	-	-	-
MDLM 9B 2026.02		68.9	-	-	-
MetaGPT 2026.03		67.07	-	-	-
d2Cache 2025.09		64.02	11.74	87.64	19.14
CAMEL 2026.03		62.2	-	-	-
dLLM-Cache 2025.09		60.97	3.03	338.21	19.79
LLaMA3 8B 2026.02		59.8	-	-	-
Fast dLLM 2025.09		58.53	4.77	214.5	19.16
Vanilla 2025.09		56.71	2.62	393.47	19.06
LLaDA 8B 2026.02		47.6	-	-	-
EGSPO-SA 2026.03		44.5	-	-	-
EGSPO-SA 2026.03		44.5	-	-	-
EGSPO-SA 2026.03		41.5	-	-	-
EGSPO 2026.03		40.2	-	-	-
EGSPO 2026.03		40.2	-	-	-
EGSPO 2026.03		39.6	-	-	-
LLaDA-8B-Instruct 2026.03		37.8	-	-	-
LLaDA-8B-Instruct 2026.03		37.8	-	-	-
d1 2026.03		37.8	-	-	-
d1 2026.03		37.8	-	-	-
LLaDA-8B-Instruct 2026.03		35.3	-	-	-
d1 2026.03		32.9	-	-	-
EGSPO 2026.03		32.3	-	-	-
EGSPO-SA 2026.03		32.3	-	-	-
d1 2026.03		31.1	-	-	-
LLaDA-8B-Instruct 2026.03		27.4	-	-	-
ARD 7B 2026.02		16.5	-	-	-