DeepSeek-R1

Benchmarks

Task Name	Dataset Name	SOTA Result
Annotation Accuracy	DeepSeek-R1 Experiment 1	F1 Score (Ga)100	40
Safety Control	DeepSeek-R1-Distill-Qwen-1.5B	P_safeguarded (Safety-Quality Score)89.8	17
Efficiency Benchmarking	DeepSeek-R1-Distill-Llama-8B 16K Generation Length	Max Batch Size8	10
Efficiency Benchmarking	DeepSeek-R1-Distill-Llama-8B 8K Generation Length	Max Batch Size8	10
Style Manipulation Attack	DeepSeek-R1-Distill	Score1.997	6
LLM Attack Effectiveness	DeepSeek-R1-Distill-Llama-8B serving environment	TTFT (s)0.08	6
Text Naturalness Evaluation	DeepSeek-R1 Experiment 2	BERT Score0.99	5
End-to-end single-step decoding	DeepSeek-R1-Distill-LLaMA-8B 64K Context	Latency (ms)24.1	4

Showing 8 of 8 rows