| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Terminal-Bench 1.0 | Accuracy51 | 12 | 1mo ago | ||
| Terminal-Bench 2.0 (full) | SkillsVote | Overall avg@5 Accuracy58.9 | 6 | 14d ago | |
| Terminal-Bench 1.0 (test) | SkillFlow-specific | Avg Pass Rate34.9 | 6 | 2mo ago | |
| TerminalBench 2.0 (test) | CRO | Test Accuracy35.2 | 4 | 21d ago | |
| Terminal Bench medium-difficulty 2 | Jupiter-N | Accuracy52.7 | 2 | 1mo ago |