Share your thoughts, 1 month free Claude Pro on usSee more

Qwen

Benchmarks

Task Name	Dataset Name	SOTA Result
Efficiency Benchmarking	Qwen3-8B Single-layer forward+backward setup	Time (ms)36.6	57
Model Discovery	Qwen-3B model tree Extended Discovery	Rank233.8	48
Jailbreak Defense	Qwen2-VL	ASR0	36
Toxicity Defense	Qwen2-VL	Toxicity Score0.05	36
LLM Inference	Qwen3-8B 2k prompts Decode-heavy workload	Throughput (tok/s)46,782	30
Inference Throughput	Qwen3 Query Projection Module NVIDIA A40	Throughput (k tokens/sec)80.63	30
Attention Operator Throughput	Qwen2.5 72B (64 Q-heads/8 KV-heads/128 Head-dimension)	Attention Throughput (TFLOPS)222.5	29
LLM Inference	Qwen3-8B 2k prompts Balanced workload	Throughput (tok/s)61,106	28
Training Throughput Analysis	Qwen 7B 2.5	Training Throughput (tokens/s)1,847	28
LLM Inference	Qwen3-8B 2k prompts Prefill-heavy workload	Throughput (tok/s)106,324	26
Multimodal Language Understanding	Qwen3 1.7B	Average Performance58.65	24
Function Module Discovery	Qwen 7B-Instruct 2.5	L(F)64.6	24
Function Module Discovery	Qwen 3B Instruct 2.5	L(F)56.9	24
Function Module Discovery	Qwen2.5-1.5B-Instruct	L(F)31.4	24
Attacker Detection	Qwen3-1.7B target τpool=1.509 (test)	Tau Multiplier (×τ)53.9	22
Model Retrieval	Qwen-7B model tree (test)	Rank1	21
Model Retrieval	Qwen-3B model tree (test)	Rank1	21
Jailbreak Attack	Qwen2.5-7B	Normalized Rate (NR)0.02	20
Jailbreak Attack Transferability	Qwen Target Transferability set 27B	ASR38	19
Language Modeling	QWEN3-like (train)	Loss2.398	19
Persona Discovery	Qwen3-80B Large Target	Similarity Score99	18
Persona Discovery	Qwen3-30B Large Target	Similarity Score99	18
Persona Discovery	Qwen3-1.7B Small Target	Similarity Score98	18
Multilingual Language Understanding	Qwen Multi-task Evaluation Suite 2.5 (test)	MC Score59.5	18
LLM Training Optimization	Qwen 3 1.7B	Time Reduction0.149	18

Showing 25 of 144 rows