Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Qwen3 Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Language UnderstandingQwen3-0.6B Zero-shot Evaluation Suite (AE, AC, SciQ, MM, MM-P, HS, OBQA, PIQA, RACE, WG, CSQA, AGI) (test)
Accuracy (AE)68.64
15
Showing 1 of 1 rows