| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Annotation Accuracy | DeepSeek-R1 Experiment 1 | F1 Score (Ga)100 | 40 | |
| Safety Control | DeepSeek-R1-Distill-Qwen-1.5B | P_safeguarded (Safety-Quality Score)89.8 | 17 | |
| Style Manipulation Attack | DeepSeek-R1-Distill | Score1.997 | 6 | |
| LLM Attack Effectiveness | DeepSeek-R1-Distill-Llama-8B serving environment | TTFT (s)0.08 | 6 | |
| Text Naturalness Evaluation | DeepSeek-R1 Experiment 2 | BERT Score0.99 | 5 | |
| End-to-end single-step decoding | DeepSeek-R1-Distill-LLaMA-8B 64K Context | Latency (ms)24.1 | 4 |