Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ProntoQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningProntoQA (test)
Accuracy99.72
36
Veracity InferencePRONTOQA (1,000 examples)
Mean Hamming Similarity96.4
20
Deductive ReasoningProntoQA
Pass@10.964
18
Explanation RefinementPrOntoQA
Initial Score0.98
15
Reasoning accuracyPRONTOQA 5-hop
Accuracy97.8
14
ReasoningProntoQA
Acc95
14
Deductive logical reasoningProntoQA (test)
Error Rate2.8
12
ReasoningPrOntoQA
PrOntoQA Score97.88
10
Logical ReasoningPrOntoQA
Calibrated Accuracy63.8
8
Reasoning accuracyPRONTOQA 4-hop
Accuracy85
6
Reasoning accuracyPRONTOQA 3-hop
Accuracy87
6
Logical ReasoningProntoQA (val)
Accuracy98.01
4
Veracity InferencePRONTOQA 5-hop (test)
Hamming Similarity0.955
4
Veracity InferencePRONTOQA 4-hop (test)
Hamming Similarity96.7
4
Veracity InferencePRONTOQA 3-hop (test)
Hamming Similarity95.6
4
Logical ReasoningPrOntoQA
Accuracy100
3
Logical ReasoningProntoQA Enhanced
OA99.8
1
Deductive logical reasoningProntoQA OOD 500 records (test)
ExcRate-
0
Showing 18 of 18 rows