| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Logical Reasoning | ProntoQA (test) | Accuracy99.72 | 36 | |
| Veracity Inference | PRONTOQA (1,000 examples) | Mean Hamming Similarity96.4 | 20 | |
| Deductive Reasoning | ProntoQA | Pass@10.964 | 18 | |
| Explanation Refinement | PrOntoQA | Initial Score0.98 | 15 | |
| Reasoning | ProntoQA | Acc95 | 14 | |
| Deductive logical reasoning | ProntoQA (test) | Error Rate2.8 | 12 | |
| Reasoning | PrOntoQA | PrOntoQA Score97.88 | 10 | |
| Logical Reasoning | PrOntoQA | Calibrated Accuracy63.8 | 8 | |
| Reasoning accuracy | PRONTOQA 5-hop | Accuracy81 | 6 | |
| Reasoning accuracy | PRONTOQA 4-hop | Accuracy85 | 6 | |
| Reasoning accuracy | PRONTOQA 3-hop | Accuracy87 | 6 | |
| Veracity Inference | PRONTOQA 5-hop (test) | Hamming Similarity0.955 | 4 | |
| Veracity Inference | PRONTOQA 4-hop (test) | Hamming Similarity96.7 | 4 | |
| Veracity Inference | PRONTOQA 3-hop (test) | Hamming Similarity95.6 | 4 | |
| Logical Reasoning | PrOntoQA | Accuracy100 | 3 | |
| Logical Reasoning | ProntoQA Enhanced | OA99.8 | 1 | |
| Deductive logical reasoning | ProntoQA OOD 500 records (test) | ExcRate- | 0 |