Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning and Knowledge Understanding on ARC, HellaSwag, LAMBADA, PIQA, WinoGrande, and MMLU

83.42ARC-e Accuracy

Qwen3-8B

71.25274.41177.5780.729Oct 8, 2025
Updated 2d ago

Evaluation Results

MethodLinks
2025.10
83.4256.7474.961.1176.6168.1174.9370.83
2025.10
83.3358.4580.2261.680.275.5360.2771.37
2025.10
82.7455.9776.4964.5478.7371.0368.871.19
2025.10
80.8554.0181.0869.4980.973.6462.571.78
2025.10
80.7652.4778.9367.2679.7172.8560.2170.31
2025.10
80.4351.4578.9364.778.7873.0974.1871.65
2025.10
80.353.0779.1468.8979.672.7765.3471.3
2025.10
80.1356.2381.5259.0979.0577.1967.2771.5
2025.10
78.6253.7579.1160.7880.0370.3254.0768.1
2025.10
77.6146.8477.9365.5379.8771.7433.1964.67
2025.10
75.447.978.6-8172.639.3-
2025.10
73.6143.8675.1769.2678.3567.7243.3964.48
2025.10
71.7245.5680.8164.8480.4172.1457.6867.59