Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Design Verification – Debugging / Bug Fixing on CVDP non-agentic 1.0
Loading...
22.86
Pass@1
GPT-o4 Mini
-0.9144
5.2578
11.43
17.6022
Dec 4, 2025
Pass@1
Updated 3mo ago
Evaluation Results
Method
Method
Links
Pass@1
GPT-o4 Mini
mode=agentic framework
2025.12
22.86
GPT-o4 Mini
mode=single-shot
2025.12
20
Nemotron-Mini
mode=single-shot
2025.12
2.86
Granite-4
mode=single-shot
2025.12
2.86
Nemotron-Mini
mode=agentic framework
2025.12
0
SmolLM
mode=single-shot
2025.12
0
SmolLM
mode=agentic framework
2025.12
0
DeepSeek-R1
mode=single-shot
2025.12
0
DeepSeek-R1
mode=agentic framework
2025.12
0
Granite-4
mode=agentic framework
2025.12
0
Feedback
Search any
task
Search any
task