Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NQ, TruthfulQA, WoW, HotpotQA, ELI5

Benchmarks

Task NameDataset NameSOTA ResultTrend
Knowledge Integration QualityNQ, TruthfulQA, WoW, HotpotQA, ELI5 Aggregate
Average Performance76.7
32
Question AnsweringNQ, TruthfulQA, WoW, HotpotQA, ELI5
Avg Score (All Datasets)82.9
20
Showing 2 of 2 rows