| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | G-bench CS | Accuracy73.9 | 11 | |
| Question Answering | G-bench Medical | Accuracy73.3 | 11 | |
| Question Answering | G-bench Novel | Accuracy58.9 | 11 | |
| Evidence Retrieval | G-bench Medical | Recall93.8 | 10 | |
| Evidence Retrieval | G-bench Novel | Recall87.7 | 10 | |
| Graph Reasoning | G-bench CS | Inference Time (s)0.2 | 9 | |
| Reasoning Explanation Generation | G-bench CS (dev) | Average R60.2 | 7 |