Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepSeek-R1

Benchmarks

Task NameDataset NameSOTA ResultTrend
Annotation AccuracyDeepSeek-R1 Experiment 1
F1 Score (Ga)100
40
Safety ControlDeepSeek-R1-Distill-Qwen-1.5B
P_safeguarded (Safety-Quality Score)89.8
17
Style Manipulation AttackDeepSeek-R1-Distill
Score1.997
6
LLM Attack EffectivenessDeepSeek-R1-Distill-Llama-8B serving environment
TTFT (s)0.08
6
Text Naturalness EvaluationDeepSeek-R1 Experiment 2
BERT Score0.99
5
End-to-end single-step decodingDeepSeek-R1-Distill-LLaMA-8B 64K Context
Latency (ms)24.1
4
Showing 6 of 6 rows