Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multiple Choice QA and Reasoning on MMLU, SciQ, OQA, CQA, SIQA, PIQA, HellaSwag, WinoGrande, ARC-c, ARC-e

74.25MMLU Accuracy

Qwen2.5-7B

60.667664.193867.7271.2462Sep 28, 2025
Updated 7d ago

Evaluation Results

MethodLinks
2025.09
74.259752.884.5258.4481.7280.2477.2763.8287.2175.73100
2025.09
74.0596.652.484.0358.481.6180.1876.7163.2887.1675.4445.57
2025.09
73.3196.65284.2858.1981.3979.7676.6462.7986.3675.1345.14
2025.09
73.0796.752.284.1958.2581.4180.176.4862.9486.1175.1550.2
2025.09
72.5196.652.484.6858.3481.7780.0876.8763.6587.2175.4145.47
2025.09
65.397.64874.2854.0483.1981.7680.0357.8584.5572.66100
2025.09
64.7897.348.474.4554.7682.7581.8479.0157.6884.5572.5545.04
2025.09
64.3297.447.474.154.2583.0381.6879.0157.9384.0972.3250.32
2025.09
62.5797.747.274.3754.0982.7581.6679.7257.3484.872.4945.72
2025.09
61.1997.24873.7953.9481.5680.1478.2256.9183.5972.4545.3