Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationCode
Accuracy98.7
242
ECG ClassificationCODE-15% (test)
Macro AUROC97
36
Continual LearningCode Domain-level stream
AA62.09
34
AI-generated text detectionCode
AUC0.979
24
Speculative DecodingCode
Throughput (tokens/s)138.72
22
Code ReasoningCode HumanEval+ LiveCodeBench v5
HEval+ (Pass@1)79.88
18
Code GenerationCode Category Average (test)
Accuracy77.43
18
Incremental BPE TokenizationCode
End-to-end CPU Time (s)1.369
15
ECG InterpretationCODE 15%
AUC86.4
15
Code GenerationCode
Performance Score52.67
12
BPE TokenizationCode
Speedup Factor2.88
12
Machine-generated text detectionCode Llama-3-70B-Instruct (test)
AUC0.951
12
AI-generated text detectionCode GPT-3.5 Turbo
AUC0.906
12
Code Generation and UnderstandingCode Crux, MultiPL_E, MBPP
Crux Score60.12
12
Code GenerationCode latest (test)
HumanEval83.9
12
Agentic RoutingCode MBPP HumanEval
Accuracy76
10
Answerability PredictionCODE n=30 (matched pairs)
AUC85
9
ECG InterpretationCODE (test)
AUC96.79
9
Multi-turn conversation performanceCode
Avg Performance98.3
9
Speculative DecodingCode Stack-Edu
Speedup2.61
8
Code GenerationCode Out-of-Domain
Accuracy63.99
8
ECG Abnormality DetectionCODE 15%
AUC91.5
8
Code GenerationCode
ASR86.9
7
Speculative DecodingCode
Speedup2.38
6
Code GenerationCode (test)
Accuracy36.5
6
Showing 25 of 37 rows