Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Code

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationCode
Accuracy98.7
242
AI-generated text detectionCode
AUC0.979
24
Machine-generated text detectionCode Llama-3-70B-Instruct (test)
AUC0.951
12
AI-generated text detectionCode GPT-3.5 Turbo
AUC0.906
12
Code Generation and UnderstandingCode Crux, MultiPL_E, MBPP
Crux Score60.12
12
Code GenerationCode latest (test)
HumanEval83.9
12
Multi-turn conversation performanceCode
Avg Performance98.3
9
ECG Abnormality DetectionCODE 15%
AUC91.5
8
Code GenerationCode
ASR86.9
7
Critique Quality EvaluationCode
Win Rate67.5
6
Language ModelingCode 24B tokens
Cross-Entropy Loss0.6994
5
CodingCODE (test)
Turns3.62
4
Code GenerationCODE
F1 Score99
4
TokenizationCode
Average Tokens per Sample1,694.26
3
Code GenerationCode
Throughput (token/s)183.54
3
Showing 15 of 15 rows