Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Modification on CVDP non-agentic 1.0
Loading...
20
Pass@1
GPT-o4 Mini
-0.8
4.6
10
15.4
Dec 4, 2025
Dec 31, 2025
Jan 28, 2026
Feb 25, 2026
Mar 25, 2026
Apr 22, 2026
May 20, 2026
Pass@1
Updated 12d ago
Evaluation Results
Method
Method
Links
Pass@1
GPT-o4 Mini
mode=single-shot
2025.12
20
GPT-o4 Mini
mode=agentic framework
2025.12
12.73
Claude Code
2026.05
3
CVDP agent
Backbone=Claude Opus 4...
2026.05
2
Codex
2026.05
2
Granite-4
mode=agentic framework
2025.12
1.82
Nemotron-Mini
mode=single-shot
2025.12
0
Nemotron-Mini
mode=agentic framework
2025.12
0
SmolLM
mode=single-shot
2025.12
0
SmolLM
mode=agentic framework
2025.12
0
DeepSeek-R1
mode=single-shot
2025.12
0
DeepSeek-R1
mode=agentic framework
2025.12
0
Granite-4
mode=single-shot
2025.12
0
Feedback
Search any
task
Search any
task