Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Zero-shot NLP Reasoning Suite (BoolQ, HellaSwag, ARC, PIQA, OBQA)

68.59Average Accuracy (Zero-Shot Suite)

LLaMA-7B

33.510842.617951.72560.8321May 19, 2023Aug 15, 2023Nov 11, 2023Feb 7, 2024May 5, 2024Aug 1, 2024Oct 28, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2023.05
68.5976.579.876.170.172.847.657.2
2024.10
66.07-------
2023.05
64.9768.4778.8976.2470.0974.5844.5442
2023.05
64.0270.3177.9175.1667.8871.0942.4143.4
2023.05
63.2573.1878.3572.9967.0167.4541.3842.4
2024.10
62.94-------
2023.05
62.1273.2476.7771.8664.6467.5939.9340.8
2023.05
61.7967.6877.1573.4165.1168.3538.442.4
2023.05
60.2469.276.5568.8966.3862.0838.9939.6
2023.05
60.0769.5476.4468.1165.1163.4337.8840
2023.05
59.7864.1976.0668.8963.366.8838.3140.8
2024.10
59.57-------
2023.05
59.2364.6277.268.863.1464.3136.7739.8
2023.05
57.2365.7574.764.5259.3560.6536.2639.4
2023.05
57.0963.3373.1863.5460.8564.4436.2638
2023.05
56.8259.3975.5765.3461.3359.1837.1239.8
2023.05
56.6957.0675.6866.859.8360.9436.5240
2024.10
56.22-------
2023.05
55.5759.0873.3964.0260.5457.9535.5838.4
2023.05
53.2561.4471.7157.2754.2255.7733.9638.4
2023.05
52.6961.8371.3356.2654.4657.0732.8535
2023.05
51.0861.567.5752.957.5450.1331.1436.8
2023.05
50.2962.3966.8749.1758.9649.6231.8333.2
2024.10
48.91-------
2024.10
48.4-------
2023.05
45.3862.7562.7341.451.0741.3827.930.4
2023.05
42.6559.665837.0452.4133.1228.5829.8
2024.10
35.03-------
2024.10
34.86-------