Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Qwen

Benchmarks

Task NameDataset NameSOTA ResultTrend
Efficiency BenchmarkingQwen3-8B Single-layer forward+backward setup
Time (ms)36.6
57
Model DiscoveryQwen-3B model tree Extended Discovery
Rank233.8
48
Jailbreak DefenseQwen2-VL
ASR0
36
Toxicity DefenseQwen2-VL
Toxicity Score0.05
36
LLM InferenceQwen3-8B 2k prompts Decode-heavy workload
Throughput (tok/s)46,782
30
Inference ThroughputQwen3 Query Projection Module NVIDIA A40
Throughput (k tokens/sec)80.63
30
Attention Operator ThroughputQwen2.5 72B (64 Q-heads/8 KV-heads/128 Head-dimension)
Attention Throughput (TFLOPS)222.5
29
LLM InferenceQwen3-8B 2k prompts Balanced workload
Throughput (tok/s)61,106
28
Training Throughput AnalysisQwen 7B 2.5
Training Throughput (tokens/s)1,847
28
LLM InferenceQwen3-8B 2k prompts Prefill-heavy workload
Throughput (tok/s)106,324
26
Multimodal Language UnderstandingQwen3 1.7B
Average Performance58.65
24
Function Module DiscoveryQwen 7B-Instruct 2.5
L(F)64.6
24
Function Module DiscoveryQwen 3B Instruct 2.5
L(F)56.9
24
Function Module DiscoveryQwen2.5-1.5B-Instruct
L(F)31.4
24
Attacker DetectionQwen3-1.7B target τpool=1.509 (test)
Tau Multiplier (×τ)53.9
22
Model RetrievalQwen-7B model tree (test)
Rank1
21
Model RetrievalQwen-3B model tree (test)
Rank1
21
Jailbreak AttackQwen2.5-7B
Normalized Rate (NR)0.02
20
Jailbreak Attack TransferabilityQwen Target Transferability set 27B
ASR38
19
Language ModelingQWEN3-like (train)
Loss2.398
19
Persona DiscoveryQwen3-80B Large Target
Similarity Score99
18
Persona DiscoveryQwen3-30B Large Target
Similarity Score99
18
Persona DiscoveryQwen3-1.7B Small Target
Similarity Score98
18
Multilingual Language UnderstandingQwen Multi-task Evaluation Suite 2.5 (test)
MC Score59.5
18
LLM Training OptimizationQwen 3 1.7B
Time Reduction0.149
18
Showing 25 of 144 rows