Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot Evaluation on Winogrande, OBQA, HellaSwag, BoolQ, ARC (c/e), and RTE

76.07Winogrande Accuracy

BTC-LLM

63.444466.72227073.2778May 24, 2025
Updated 9d ago

Evaluation Results

MethodLinks
2025.05
76.074576.0771.7173.9945.3966.0664.48
2025.05
75.773663.3782.6980.352.967.1567.4
2025.05
72.6933.259.9177.8977.446.4270.463.8
2025.05
72.2235.260.0380.5579.4248.3865.3465
2025.05
71.594169.8577.3771.5541.348.0160.1
2025.05
70.841.672.4874.8667.842.2455.9660.82
2025.05
69.4671.5372.6371.5370.7542.7564.6261.91
2025.05
65.9836.263.6765.3868.8634.0456.6855.83
2025.05
63.933757.7671.5360.5631.9954.1553.85