Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Claude Code

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code LeakageClaude Code MiniMax-M2.7
Exact Match (EM)60
12
Remote Code Execution Attack Success RateClaude Code
C-F Success Rate43.67
3
Coding TasksClaude Code Evaluation Set
Number of Samples26
3
Software Engineering TasksClaude Code
Successful Tasks Count40
3
Agent Skill ExecutionClaude Code (test)
Total Tokens18.7
2
Showing 5 of 5 rows