| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Inference | HANS (test) | Accuracy78.65 | 54 | |
| Classification | HANS (test) | Accuracy71.2 | 32 | |
| Natural Language Inference | HANS (val) | Accuracy84.3 | 28 | |
| Natural Language Inference | HANS | Accuracy100 | 23 | |
| Structural Bias Evaluation | HANS | Accuracy99.6 | 14 | |
| Natural Language Inference | HANS Unknown Bias (out-of-distribution) | Accuracy70.7 | 13 | |
| Natural Language Inference | HANS Syntactic Bias (out-of-distribution) | Accuracy70.7 | 13 | |
| Natural Language Inference | HANS | HANS Accuracy99.4 | 4 | |
| Natural Language Inference | HANS (OOD) | Accuracy65.82 | 4 |