Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Zero-shot Evaluation on Reasoning Suite (BoolQ, PIQA, Hellaswag, Winogrande, ARC-e, ARC-c, OBQA)
Loading...
77.7
Accuracy (BoolQ)
LLaMA2
60.852
65.226
69.6
73.974
Aug 9, 2025
Accuracy (BoolQ)
Accuracy (PIQA)
Accuracy (Hellaswag)
Accuracy (Winogrande)
Accuracy (ARC-Easy)
Accuracy (ARC-Challenge)
Accuracy (OBQA)
Average Accuracy (Zero-Shot Suite)
Updated 15d ago
Evaluation Results
Method
Method
Links
Accuracy (BoolQ)
Accuracy (PIQA)
Accuracy (Hellaswag)
Accuracy (Winogrande)
Accuracy (ARC-Easy)
Accuracy (ARC-Challenge)
Accuracy (OBQA)
Average Accuracy (Zero-Shot Suite)
LLaMA2
Size=7B, Bit=16, Zero-...
2025.08
77.7
79.1
76
69.1
74.6
46.2
44.2
66.7
BinaryLLM
Size=7B, Bit=1.01, Zer...
2025.08
65.5
73.2
59.5
66.4
60.3
37.2
34.5
56.7
MoS
Size=7B, Bit=1.01, Zer...
2025.08
65
71.6
59.4
56.2
41.8
30
-
-
OneBit-LLaMA2
Size=7B, Bit=1.01, Zer...
2025.08
63.1
68.1
52.6
58.4
41.6
29.6
-
-
FBI-LLM
Size=7B, Bit=1.01, Zer...
2025.08
61.5
72.6
57.7
58.9
53
29.9
36.8
52.9
Feedback
Search any
task
Search any
task