| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Writing flaw detection | BenchMarker Writing - Out of Domain, Human | Accuracy92.6 | 27 | |
| Writing flaw detection | BenchMarker Writing - In Domain, NLP | Accuracy81.5 | 27 | |
| Shortcut detection | BenchMarker Shortcuts | Accuracy81.6 | 26 | |
| Contamination detection | BenchMarker | Accuracy71.2 | 11 |