Coding

Benchmarks

Dataset Name	SOTA Method	Metric
MBPP		Accuracy98.4	175	1mo ago
HumanEval	CompassMax-V3-Thinking	Pass@198.17	168	1mo ago
HumanEval+		Pass@195.12	164	1mo ago
MBPP+		Pass@197.88	117	1mo ago
HumanEval	DMoA	Accuracy95.62	84	1mo ago
MBPP	SwiR	Pass@1 Accuracy95.33	78	2mo ago
LiveCodeBench v6	Fugu-Ultra	Score (%)92	51	1mo ago
Eval+		Eval+ Score87.1	47	1mo ago
CodeContest	Qwen3-4B-REVES	Accuracy37.9	45	1mo ago
LiveCodeBench Aug 24 – Jan 25	Qwen3-4B-REVES	Accuracy50.9	45	1mo ago
Coding Tasks (test)	SALE	Pass@198.3	42	4mo ago
HumanEval (test)		Test Accuracy74.4	40	1mo ago
LiveCodeBench	RSA	Accuracy70	40	1mo ago
MBPP		Overall Average Score81	37	1mo ago
HumanEval, MBPP	D3	HumanEval Score50.2	35	1mo ago
LiveCodeBench v5	Qwen3-235B-A22B-R-TAP	Accuracy77.6	33	29d ago
LiveCode (test)		Score36.8	32	1mo ago
Eval+ (test)		Score87.1	32	1mo ago
HumanEval	Ministral-3-R	HumanEval Mean Score0.9695	32	4mo ago
LiveCode	REAP	LiveCode Score41.2	31	1mo ago
MultiPL-E		Score87.9	31	2mo ago
Coding Real-data 20% verified		Original Accuracy63.92	30	1mo ago
LiveCodeBench Jan 25 – May 25	Qwen3-4B-REVES	Accuracy42	30	1mo ago
LiveCodeBench	EqLen-GRPO	Acc (avg@32)74.2	29	2mo ago
HumanEval		HumanEval79.9	28	2mo ago

Showing 25 of 146 rows