| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Evaluation | AbstainQA (test) | Accuracy14.7 | 11 | |
| Safety Evaluation | AbstainQA (val) | Accuracy26 | 11 | |
| Selective Question Answering | AbstainQA (test) | Accuracy13 | 11 | |
| Selective Question Answering | AbstainQA (val) | Accuracy21 | 11 |