Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Coding

Benchmarks

Task NameDataset NameSOTA ResultTrend
Response correctness and completeness evaluationCoding
F1 Score85
32
Prompt Injection DetectionCoding Direct Prompt Injection
FPR0
7
Code GenerationCoding Gender (test)
Cor (%)40
5
Code GenerationCoding Race (test)
Correctness Rate57
5
Showing 4 of 4 rows