Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Zero-shot Natural Language Understanding on ARC, BoolQ, HellaSwag, LAMBADA, PIQA, RACE, SciQ, Record, OBQA

46.8ARC Challenge

Llama 7B Baseline

17.201624.885832.5740.2542Aug 18, 2025Sep 15, 2025Oct 14, 2025Nov 11, 2025Dec 10, 2025Jan 7, 2026Feb 5, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
46.873.7-76.9-79.9---4365.170
2026.02
46.473.8-76.4-79.3---4665.269.1
2026.02
32.560.8-56.2-73---34.652.759
2026.02
32.360.2-56.3-74.1---36.252.757.1
2025.08
20.943.2261.7132.5121.6667.4128.4269.963.6118.442.77-
2025.08
20.7344.4962.2332.8523.2767.4128.5272.56418.443.44-
2025.08
19.8844.2860.5532.2321.9366.9727.9470.962.5717.642.49-
2025.08
19.844.3659.9432.5421.5267.0328.2368.762.841642.1-
2025.08
19.374261.7432.121.1966.1627.1868.462.2617.641.8-
2025.08
19.2844.0761.1632.0321.3567.1427.0867.961.551641.76-
2025.08
18.8644.4961.931.7421.5466.3828.5269.462.0816.242.11-
2025.08
18.6940.1957.0628.9116.2863.7125.6564.256.051538.57-
2025.08
18.3443.7357.6130.9619.765.1827.3768.360.0616.240.75-