Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepSeek

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackDeepSeek
NR Score0
20
Jailbreak attackDeepSeek-7b five finetuned variants
Average ASR3.8
16
Jailbreak Attackdeepseek-7b v1 (pretrained)
ASR (%)100
13
Constrained LLM DecodingDeepSeek-V2-Lite-Chat 15.7B
Inference Time (ms)49.91
10
JailbreakingDeepSeek V3.2
Attack Success Rate78.5
9
Detection of paraphrased textDeepSeek Paraphrased V3
ROC AUC (1% FPR)0.4178
8
Watermarking DetectionDeepSeek-7B
AUC100
7
Watermark DetectionDeepSeek
Detection Rate99.3
7
Contribution and Evidence GenerationDeepSeek-V4-Pro generated SFT targets
Entity Fidelity0.977
6
Output conformance to revised specificationDeepSeek-V3 primary grid (1,008 balanced runs)
Quality Score3.79
5
Policy Corruption EvaluationDeepSeek V3
Compliance4.12
5
Training ThroughputDeepSeek-V2-Lite workload
Training Throughput (tokens/s)114,600
3
CPU Inference Performance EvaluationDeepSeek Lite V2
Memory Usage (GB)8.8
3
Weight Reconstruction FidelityDeepSeek-V3 Weights
Weight ΔW L2 Distance0
3
Optimizer state memory measurementDeepSeek-V2-Lite (16B) (train)
Average Optimizer State Memory (MB)55.3
2
Showing 15 of 15 rows