| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Watermark Detection | dolly_cw | Accuracy100 | 48 | |
| Hallucination detection | Dolly AC (test) | AUC81.59 | 33 | |
| Detection Accuracy | dolly_cw | Accuracy99.27 | 24 | |
| Instruction Following | Dolly | SBERT Similarity71.4 | 24 | |
| Instruction Tuning | Dolly-15K alpha=5.0 | Rouge-L35.79 | 22 | |
| Instruction Tuning | Dolly-15K alpha=0.5 | Rouge-L35.48 | 22 | |
| Hallucination Detection | Dolly Llama2-13B (test) | Accuracy75.76 | 21 | |
| Hallucination Detection | Dolly Llama2-7B (test) | Acc77.78 | 21 | |
| Hallucination Detection | Dolly AC LLaMA3-8B | Recall83.92 | 19 | |
| Hallucination Detection | Dolly AC LLaMA2-13B | Recall0.9741 | 19 | |
| Hallucination Detection | Dolly AC LLaMA2-7B | Recall87.28 | 19 | |
| Instruction Following | Dolly Eval | A Win Count62 | 19 | |
| Instruction Following | Dolly | Score71.3 | 18 | |
| Machine Unlearning | Dolly-15k Mistral-7B variant (Seen) | Seen ASR86 | 14 | |
| Fine-tuning Robustness | Dolly Dataset | FSR92 | 10 | |
| Federated Learning | Dolly-15K | Speedup18.86 | 10 | |
| Machine Unlearning | Dolly-15k OOD triggers 1.0 (test) | OOD ASR47.2 | 7 | |
| Machine Unlearning | Dolly-15k Mistral-7B variant (OOD) | OOD ASR27.7 | 7 | |
| Machine Unlearning | Dolly-15k Clean Mistral-7B variant (val) | Clean PPL19.7 | 7 | |
| Open-ended instruction following | Dolly Eval | A Win Rate54 | 7 | |
| Watermark Spoofing | Dolly CW | TPR @ FPR=10%67 | 6 | |
| LLM-rated generation quality | Dolly | Correctness4.1 | 6 | |
| Instruction Following | Dolly | Rouge-L25.2 | 6 | |
| Hallucination Detection | Dolly-15k Qwen2.5-7B (test) | Precision84.21 | 6 | |
| Hallucination Detection | Dolly-15k Qwen2.5-3B (test) | Precision80.55 | 6 |