| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Generation | OR-ShARC (test) | F1 (BLEU-1)59.3 | 7 | |
| Decision Making | OR-ShARC (test) | Micro Aggregation Score0.785 | 7 | |
| Question Generation | OR-ShARC (dev) | F1 (BLEU-1)65.5 | 7 | |
| Decision Making | OR-ShARC (dev) | Micro Avg83.4 | 7 | |
| Open-retrieval | OR-ShARC (test) | Top-1 Accuracy79.8 | 4 | |
| Open-retrieval | OR-ShARC (dev) | Top-1 Accuracy66.3 | 4 | |
| Question Generation | OR-ShARC unseen (test) | F1 BLEU-134.9 | 3 | |
| Question Generation | OR-ShARC (test seen) | F1-BLEU-183.4 | 3 |