Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning (LM Evaluation Harness) Zero-Shot

11.86LAMBADA Perplexity

PKDA

10.50419.65728.8137.963Apr 22, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
11.8617.8257.2426.9649.0548.5270.6255.4958.886.456.64
2026.04
13.0918.0156.6626.9649.6646.4970.7354.5459.986.156.38
2026.04
13.4618.0556.127.348.9146.4270.0854.2255.884.755.44
2026.04
14.5217.9554.5526.8848.9545.1670.2954.2257.685.855.43
2026.04
25.3325.9846.9722.9537.0137.2265.6851.4659.178.749.89
2026.04
28.4326.8647.8124.2136.1935.5564.8352.9760.17849.96
2026.04
31.3726.1845.4522.736.0634.046652.255779.549.12
2026.04
31.3927.0844.4924.3235.9634.565.8351.356.378.248.86
2026.04
43.0529.0544.723.5233.8330.4865.1452.4158.575.147.96
2026.04
45.7629.0444.0223.5534.0829.5865.0750.9156.875.347.41