| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Inference | CB | Accuracy98.2 | 110 | |
| Classification | CB | Accuracy91.1 | 46 | |
| Natural Language Inference | CB SuperGLUE (test) | Accuracy91.43 | 33 | |
| Natural Language Inference | CB | Average Accuracy91 | 29 | |
| Natural Language Inference | CB val (test) | Accuracy94.6 | 19 | |
| CommitmentBank | CB | Accuracy84.99 | 16 | |
| Natural language inference | CB (test) | Accuracy89.3 | 13 | |
| Text Classification | CB (test) | Macro-F164.6 | 10 | |
| Natural Language Inference | CB | Total Communication Time (10^3 s)5.43 | 9 | |
| Natural Language Inference | CB SuperGLUE (test dev) | Accuracy84 | 8 | |
| Natural Language Inference | CB | Accuracy87.5 | 8 | |
| Four-class classification | CB (evaluation set) | Precision59.08 | 8 | |
| Natural Language Inference | CB | F161.51 | 7 | |
| Natural language inference | CB | Macro F1 Score0.537 | 6 | |
| Natural Language Inference | CB | Acc (0-shot)82.1 | 6 | |
| Natural Language Inference | CB (dev) | Accuracy0.84 | 6 | |
| Natural Language Inference | CB 32 samples | F1 Score86.5 | 6 | |
| Tabular Classification | CB (test) | AUROC88 | 4 |