Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Test Generation on TDD-Bench Verified (test)
Loading...
22.75
Pass Rate
SWE-SPOT
-0.91
5.2325
11.375
17.5175
Jan 29, 2026
Pass Rate
Avg Execution Time
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass Rate
Avg Execution Time
SWE-SPOT
Size=4B, SWE Training...
2026.01
22.75
17.12
GPT-4.1-mini
Size=[10, 100]B, SWE T...
2026.01
22.27
17.85
CWM (Meta)
Size=32B, SWE Training...
2026.01
17.38
15.88
Qwen3-Coder-30B
Size=32B, SWE Training...
2026.01
11.85
11.56
Gemini-2.5-Flash-Lite
Size=[0.1, 10]B, SWE T...
2026.01
10.74
7.46
Qwen3-4B-Instruct-2507
Size=4B, SWE Training...
2026.01
5.37
3.2
Gemma-3-27b-it
Size=27B, SWE Training...
2026.01
2.37
1.34
Mini-Coder-4B
Size=4B, SWE Training...
2026.01
0.63
8.7
GPT-5-nano
Size=[0.1, 10]B, SWE T...
2026.01
0
2.97
Feedback
Search any
task
Search any
task