| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Next Token Prediction | WildChat | Next Token Accuracy51 | 32 | |
| Fingerprint Detection | WildChat Fr | FSR1 | 18 | |
| Proactive next utterance prediction | WildChat (test) | LLM-Judge52.16 | 17 | |
| Safety Evaluation | WildChat (test) | WildChat Score69.85 | 13 | |
| Synthetic Text Generation | WildChat | Mean Embedding Similarity0.31 | 10 | |
| Safety Evaluation | WildChat unsafe prompts | Not-Unsafe Rate99.82 | 9 | |
| Next Token Prediction | WildChat | BERT-Small Next Token Accuracy (eps=inf)28.78 | 5 | |
| Safety | WildChat | Safe Response Rate94.22 | 2 |