| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Model Discovery | Qwen-3B model tree Extended Discovery | Rank233.8 | 48 | |
| Jailbreak Defense | Qwen2-VL | ASR0 | 36 | |
| Toxicity Defense | Qwen2-VL | Toxicity Score0.05 | 36 | |
| Inference Throughput | Qwen3 Query Projection Module NVIDIA A40 | Throughput (k tokens/sec)80.63 | 30 | |
| Attention Operator Throughput | Qwen2.5 72B (64 Q-heads/8 KV-heads/128 Head-dimension) | Attention Throughput (TFLOPS)222.5 | 29 | |
| Training Throughput Analysis | Qwen 7B 2.5 | Training Throughput (tokens/s)1,847 | 28 | |
| Function Module Discovery | Qwen 7B-Instruct 2.5 | L(F)64.6 | 24 | |
| Function Module Discovery | Qwen 3B Instruct 2.5 | L(F)56.9 | 24 | |
| Function Module Discovery | Qwen2.5-1.5B-Instruct | L(F)31.4 | 24 | |
| Model Retrieval | Qwen-7B model tree (test) | Rank1 | 21 | |
| Model Retrieval | Qwen-3B model tree (test) | Rank1 | 21 | |
| Jailbreak Attack | Qwen2.5-7B | Normalized Rate (NR)0.02 | 20 | |
| LLM Training Optimization | Qwen 3 1.7B | Time Reduction0.149 | 18 | |
| Fingerprint Similarity | Qwen 7B 2.5 | Fingerprint Similarity Score0.9979 | 18 | |
| Hallucination Tracing | Qwen | Recall@k83.31 | 15 | |
| Large Language Model Evaluation | Qwen-32B | MMLU80.81 | 13 | |
| Long-Context Generation | Qwen3 Context length (50K) | Throughput Speedup (α)6.02 | 12 | |
| Long-Context Generation | Qwen3 Context length 10K | Throughput Speedup (α)2.76 | 12 | |
| LLM fingerprinting | Qwen 14B 2.5 | AUC100 | 10 | |
| LLM fingerprinting | Qwen 7B 2.5 | AUC100 | 10 | |
| Jailbreak Attack | Qwen2-VL | ASR96.4 | 10 | |
| Jailbreak Attack | Qwen VL-235B 3 | ASR2.32 | 9 | |
| Jailbreak Attack | Qwen2.5-VL-32B | ASR10.8 | 9 | |
| Jailbreak Attack | Qwen2.5-VL-7B | ASR98 | 9 | |
| Inference Efficiency | Qwen2.5-7B | Throughput (tokens/s)1,480.2 | 9 |