Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TruthQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Correctness PredictionTruthQA
Accuracy69.12
18
Memory & ReasoningTruthQA multi-round
Accuracy69.2
6
Showing 2 of 2 rows