Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on SQL-Create Context
Loading...
97
Execution Accuracy
No Defense
67.36
75.055
82.75
90.445
Jan 23, 2026
Execution Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Execution Accuracy
No Defense
Model=Llama-3-8B-Instruct
2026.01
97
No Defense
Model=Qwen2.5-7B-Instruct
2026.01
95.4
Self-Examination
Model=Qwen2.5-7B-Instruct
2026.01
95.4
ICD
Model=Qwen2.5-7B-Instruct
2026.01
94.9
Self-Reminder
Model=Qwen2.5-7B-Instruct
2026.01
94.9
SafeThinker
Model=Qwen2.5-7B-Instruct
2026.01
94.4
SafeDecoding
Model=Qwen2.5-7B-Instruct
2026.01
93.4
Self-Reminder
Model=Llama-3-8B-Instruct
2026.01
93.2
SafeThinker
Model=Llama-3-8B-Instruct
2026.01
92.9
SafeDecoding
Model=Llama-3-8B-Instruct
2026.01
92.3
Self-Examination
Model=Llama-3-8B-Instruct
2026.01
91.5
PPL
Model=Llama-3-8B-Instruct
2026.01
91.1
PPL
Model=Qwen2.5-7B-Instruct
2026.01
91.1
ICD
Model=Llama-3-8B-Instruct
2026.01
68.5
Feedback
Search any
task
Search any
task