Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ProntoQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningProntoQA (test)
Accuracy99.72
57
Deductive ReasoningProntoQA
Pass@199.2
38
Logical ReasoningPrOntoQA 5-hop
Accuracy90.2
20
Veracity InferencePRONTOQA (1,000 examples)
Mean Hamming Similarity96.4
20
Logical ReasoningPrOntoQA
Calibrated Accuracy91.4
17
Correctness predictionProntoQA
AUROC79.9
15
Explanation RefinementPrOntoQA
Initial Score0.98
15
Reasoning accuracyPRONTOQA 5-hop
Accuracy97.8
14
ReasoningProntoQA
Acc95
14
Deductive logical reasoningProntoQA (test)
Error Rate2.8
12
ReasoningPrOntoQA
PrOntoQA Score97.88
10
Multi-hop ReasoningPrOntoQA
Accuracy (1 Hop)97.3
8
Reasoning accuracyPRONTOQA 4-hop
Accuracy85
6
Reasoning accuracyPRONTOQA 3-hop
Accuracy87
6
Logical ReasoningProntoQA (val)
Accuracy98.01
4
Veracity InferencePRONTOQA 5-hop (test)
Hamming Similarity0.955
4
Veracity InferencePRONTOQA 4-hop (test)
Hamming Similarity96.7
4
Veracity InferencePRONTOQA 3-hop (test)
Hamming Similarity95.6
4
Logical ReasoningPrOntoQA
Accuracy100
3
Logical ReasoningProntoQA Enhanced
OA99.8
1
Deductive logical reasoningProntoQA OOD 500 records (test)
ExcRate-
0
Showing 21 of 21 rows