| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| AMR Similarity Consistency | BAMBOO (test) | Main - STS-B66.94 | 17 | |
| Question Answering | Bamboo | Accuracy50.4 | 14 | |
| Long-context Reasoning | BAMBOO 16k | AltQA Score41.5 | 13 | |
| Expected Calibration Error | Bamboo | Expected Calibration Error34.01 | 10 |