| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| LLM Evaluation | Qwen3-1.7B Evaluation Suite (avg) | Average Performance58.64 | 38 | |
| Language Model Evaluation | Qwen3-0.6B Evaluation Suite average | Average Performance47.8 | 24 | |
| Pre-verbalization preference stabilization | Qwen Evaluation Suite Prompt shift Qwen3 | Accuracy100 | 2 | |
| Pre-verbalization preference stabilization | Qwen3 Evaluation Suite Verbalizer shift | Accuracy100 | 1 | |
| Pre-verbalization preference stabilization | Qwen3 Evaluation Suite Canonical | Accuracy100 | 1 |