Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MISBENCH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Misinformation DetectionMISBENCH (Multi-hop based Misinformation) 1.0 (test)
Factual Memory Success Rate96.88
12
Misinformation DetectionMISBENCH One-hop based Misinformation 1.0 (test)
Factual Memory Success Rate91.44
12
Showing 2 of 2 rows