Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FOLIO

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningFOLIO
Accuracy89.2
123
Logical ReasoningFOLIO (test)
Accuracy95.6
58
Natural Language InferenceFOLIO
Accuracy0.61
26
NL-to-FOL Syntax CorrectnessFOLIO (test)
Syntax Correctness Rate99
26
First-order logic formalizationFOLIO
Accuracy31.53
24
Mathematical ReasoningFOLIO to GSM8K
Accuracy95.1
18
First-Order Logic ReasoningFOLIO
Pass@1 Success Rate84.7
18
Binary ClassificationFOLIO
Accuracy81
18
Logical ReasoningFOLIO-wiki-curated (test)
Accuracy98.04
17
Explanation RefinementFOLIO
Initial Score85.25
15
Deductive logical reasoningFOLIO 203 (dev)
Exclusion Rate6.4
12
Logical ReasoningFOLIO full expert-curated
Accuracy79.9
8
Adding MistakeFOLIO
AOC0.714
7
Truncated CoT AnsweringFOLIO
AOC0.35
7
First-Order Logic translationFOLIO (test)
BLEU66
7
Logical ReasoningFOLIO (val)
Accuracy69.12
5
Logical ReasoningFOLIO FOL fields (val)
Accuracy68.1
4
Logical reasoningFOLIO
Optimization-phase Token Usage453
3
Logical ReasoningFOLIO
Accuracy48
2
Showing 19 of 19 rows