| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Logical Reasoning | LogiQA | LogiQA Accuracy78.9 | 251 | |
| Logical Reasoning | LogiQA (test) | Accuracy86 | 151 | |
| Logical Reasoning | LogiQA-2 | Accuracy83.8 | 116 | |
| Logical Reasoning | LogiQA | Accuracy80.4 | 100 | |
| Logical Reasoning | LogiQA | Accuracy50.23 | 98 | |
| Logical Reasoning | LogiQA (val) | Accuracy58.37 | 50 | |
| Logical Reasoning | LogiQA (dev) | Accuracy47.3 | 40 | |
| Logical Reasoning | LogiQA | Accuracy60.22 | 34 | |
| Logical Inference | LogiQA | Task Success Rate (TSR)76.75 | 30 | |
| Logical Reasoning | LogiQA original (test) | Accuracy43.16 | 22 | |
| Confidence alignment | LogiQA | ECE0.039 | 21 | |
| Commonsense Reasoning | LogiQA | Accuracy29.8 | 21 | |
| Logical Reasoning | LogiQA | Accuracy50.94 | 20 | |
| Logical Reasoning | LogiQA | Acc@t146.6 | 20 | |
| Confidence Calibration | LogiQA (out-of-distribution) | ECE8 | 18 | |
| Logical Reasoning | LogiQA | Pass@1 Accuracy0.88 | 18 | |
| Correctness Prediction | LogiQA | Accuracy67.75 | 18 | |
| Question Answering | LogiQA | Accuracy44.29 | 17 | |
| Logical Reasoning | LogiQA | Pass@1 Accuracy48.61 | 14 | |
| Logical Reasoning | LogiQA | Accuracy (LogiQA)68.9 | 12 | |
| Question Answering | LogiQA (test) | Accuracy85.75 | 12 | |
| Logical Reasoning | LogiQA | Accuracy74.1 | 11 | |
| Logical Reasoning | LogiQA 1.0 (test) | Accuracy86 | 11 | |
| Logical Reasoning | LogiQA Chinese | Pass@1 Accuracy52.4 | 10 | |
| Logical Reasoning | LogiQA English | Pass@1 Accuracy53 | 10 |