Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on TruthfulQA

55.6Accuracy

Full-step decoding

33.55239.2764550.724Aug 27, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
55.6--
2025.08
53.2-2.41.83
2025.08
46.111.72.31
2025.08
34.4--