| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Red-teaming | GPT-OSS 20B | Coverage63.2 | 5 | |
| Language Modeling | GPT-OSS 20B held-out (test) | Perplexity34.56 | 5 | |
| Retrieval-Augmented Generation | openai/gpt-oss-20b Long prompt | TTFT (s)7.72 | 3 | |
| Retrieval-Augmented Generation | openai/gpt-oss-20b Medium prompt | Time To First Byte (s)2.45 | 3 | |
| Retrieval-Augmented Generation | openai/gpt-oss-20b Short prompt | TTFT (s)1.39 | 3 | |
| Training Throughput | GPT-OSS-20B workload | Throughput (tokens/s)140,900 | 2 | |
| Language Modeling | Mini-GPT-OSS (val) | Validation Loss2.94 | 2 |