Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FOLIO

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningFOLIO
Accuracy89.2
126
Logical ReasoningFOLIO (test)
Accuracy95.6
58
Natural Language InferenceFOLIO
Accuracy0.61
26
NL-to-FOL Syntax CorrectnessFOLIO (test)
Syntax Correctness Rate99
26
First-order logic formalizationFOLIO
Accuracy31.53
24
Mathematical ReasoningFOLIO to GSM8K
Accuracy95.1
18
First-Order Logic ReasoningFOLIO
Pass@1 Success Rate84.7
18
Binary ClassificationFOLIO
Accuracy81
18
Logical ReasoningFOLIO-wiki-curated (test)
Accuracy98.04
17
Explanation RefinementFOLIO
Initial Score85.25
15
Abductive ReasoningFOLIO
Accuracy (FOLIO)88
14
Deductive logical reasoningFOLIO 203 (dev)
Exclusion Rate6.4
12
Logical ReasoningFOLIO full expert-curated
Accuracy79.9
8
Adding MistakeFOLIO
AOC0.714
7
Truncated CoT AnsweringFOLIO
AOC0.35
7
First-Order Logic translationFOLIO (test)
BLEU66
7
Logical ReasoningFOLIO (val)
Accuracy69.12
5
Logical ReasoningFOLIO FOL fields (val)
Accuracy68.1
4
Logical ReasoningFOLIO n=203 (held-out)
Per-Class Loss0.7195
3
Logical reasoningFOLIO
Optimization-phase Token Usage453
3
Logical ReasoningFOLIO
Pass@169.24
2
Logical ReasoningFOLIO
Accuracy48
2
Showing 22 of 22 rows