Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot NLP Reasoning Suite (BoolQ, HellaSwag, ARC, PIQA, OBQA)

68.59Average Accuracy (Zero-Shot Suite)

LLaMA-7B

33.510842.617951.72560.8321May 19, 2023Nov 11, 2023May 6, 2024Oct 30, 2024Apr 25, 2025Oct 19, 2025Apr 14, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2023.05
68.5976.579.876.170.172.847.657.2-
2024.10
66.07--------
2023.05
64.9768.4778.8976.2470.0974.5844.5442-
2023.05
64.0270.3177.9175.1667.8871.0942.4143.4-
2023.05
63.2573.1878.3572.9967.0167.4541.3842.4-
2024.10
62.94--------
2023.05
62.1273.2476.7771.8664.6467.5939.9340.8-
2023.05
61.7967.6877.1573.4165.1168.3538.442.4-
2023.05
60.2469.276.5568.8966.3862.0838.9939.6-
2023.05
60.0769.5476.4468.1165.1163.4337.8840-
2023.05
59.7864.1976.0668.8963.366.8838.3140.8-
2024.10
59.57--------
2023.05
59.2364.6277.268.863.1464.3136.7739.8-
2023.05
57.2365.7574.764.5259.3560.6536.2639.4-
2023.05
57.0963.3373.1863.5460.8564.4436.2638-
2023.05
56.8259.3975.5765.3461.3359.1837.1239.8-
2023.05
56.6957.0675.6866.859.8360.9436.5240-
2024.10
56.22--------
2023.05
55.5759.0873.3964.0260.5457.9535.5838.4-
2026.04
53.88--------
2023.05
53.2561.4471.7157.2754.2255.7733.9638.4-
2023.05
52.6961.8371.3356.2654.4657.0732.8535-
2023.05
51.0861.567.5752.957.5450.1331.1436.8-
2023.05
50.2962.3966.8749.1758.9649.6231.8333.2-
2024.10
48.91--------
2024.10
48.4--------
2026.04
47.85-------61.9
2026.04
47.7--------
2026.04
47.64-------38.1
2026.04
47.56-------47.62
2026.04
47.4-------38.1
2026.04
47.4-------42.86
2026.04
46.16-------0
2023.05
45.3862.7562.7341.451.0741.3827.930.4-
2026.04
43.03-------71.43
2026.04
43.02-------76.19
2026.04
42.84-------57.14
2023.05
42.6559.665837.0452.4133.1228.5829.8-
2026.04
42.56-------42.86
2026.04
42.55-------47.62
2026.04
42.51--------
2026.04
40.8-------14.29
2026.04
38.79-------85.71
2026.04
38.68-------90.48
2026.04
38.57-------80.95
2026.04
38.43-------80.95
2026.04
38.33-------80.95
2026.04
37.37--------
2026.04
35.66-------4.76
2024.10
35.03--------
2024.10
34.86--------