Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Question Answering on TruthfulQA Semantic-level split o=3

98.1Accuracy

W/O Decontamination

21.97241.73661.581.264Jan 27, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
98.10.514
2026.01
97.10.722
2026.01
97.10.722
2026.01
95.70.49
2026.01
95.20.421
2026.01
95.10.485
2026.01
94.70.416
2026.01
91.40.665
2026.01
900.369
2026.01
85.20.385
2026.01
63.40.385
2026.01
63.20.101
2026.01
62.20.091
2026.01
60.80.141
2026.01
57.90.33
2026.01
55.10.302
2026.01
53.1-
2026.01
52.60.005
2026.01
51.90.053
2026.01
46.7-
2026.01
24.9-