Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TruthQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Correctness PredictionTruthQA
Accuracy69.12
18
Memory & ReasoningTruthQA multi-round
Accuracy69.2
6
Showing 2 of 2 rows