Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ProverQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deductive logical reasoningProverQA hard (test)
Error Rate0
12
Logical ReasoningProverQA hard split
Accuracy0.686
8
Deductive logical reasoningProverQA OOD hard subset 500 records (test)
Error Rate-
0
Showing 3 of 3 rows