| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LLMSecEval 150 tasks | MA-CoT | Number of Vulnerabilities0 | 36 | 8d ago | |
| CWEval | pass@148.2 | 29 | 15d ago | ||
| CWEval | Functionality92.27 | 22 | 3mo ago | ||
| CodeGuard+ | Hybrid (CodeGuard + SCS) | Pass@185.93 | 18 | 2mo ago | |
| CyberSecEval SCG | SafeCoder | Safety79.06 | 17 | 3mo ago | |
| LLMSecEval | gpt-5 | Total Vulnerabilities0 | 12 | 8d ago | |
| Primary Dataset | gemini-2.5 | Total Vulnerabilities8 | 12 | 8d ago | |
| Secure Code Average | SecCoderX | Safety Score55.36 | 12 | 3mo ago | |
| SecHolmesEval | P10 Hybrid Pipeline | Insecure Generation Rate1.9 | 8 | 2mo ago | |
| SecLLMEval | Insecure Generation Rate2.7 | 8 | 2mo ago | ||
| Secure Code Generation Scenarios 1.0 (test) | gemini-2.5-pro (Reflex) | Security Success Rate0.971 | 8 | 3mo ago | |
| Secure Code generation | BEAVER | RDR42 | 8 | 3mo ago | |
| CVS (test) | Llama3-70b-instruct | C++ Success Rate98 | 8 | 3mo ago | |
| COBALT Security Prompts 500 prompts per model | Vulnerability Rate48.4 | 7 | 1mo ago | ||
| CyberSecEval Instruct | Mistral-7B (fine-tuned) | Secure Code Generation (%)86.01 | 2 | 1mo ago |