| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Watermark Detection | dolly_cw | Accuracy100 | 48 | |
| Instruction Following | Dolly Eval (test) | ROUGE-L29.69 | 42 | |
| Question Answering | Dolly Closed QA | ASR100 | 36 | |
| Hallucination detection | Dolly AC (test) | AUC81.59 | 33 | |
| Instruction Following | Dolly | Rouge-L27.47 | 32 | |
| Detection Accuracy | dolly_cw | Accuracy99.27 | 24 | |
| Instruction Following | Dolly | SBERT Similarity71.4 | 24 | |
| Instruction Tuning | Dolly-15K alpha=5.0 | Rouge-L35.79 | 22 | |
| Instruction Tuning | Dolly-15K alpha=0.5 | Rouge-L35.48 | 22 | |
| Instruction-tuning | Dolly | RougeL35.34 | 21 | |
| Hallucination Detection | Dolly Llama2-13B (test) | Accuracy75.76 | 21 | |
| Hallucination Detection | Dolly Llama2-7B (test) | Acc77.78 | 21 | |
| Scrubbing Attack | Dolly | AUC80 | 20 | |
| Hallucination Detection | Dolly AC LLaMA3-8B | Recall83.92 | 19 | |
| Hallucination Detection | Dolly AC LLaMA2-13B | Recall0.9741 | 19 | |
| Hallucination Detection | Dolly AC LLaMA2-7B | Recall87.28 | 19 | |
| Instruction Following | Dolly Eval | A Win Count62 | 19 | |
| Spoofing Attack Detection | Dolly CW | WCS8.88 | 18 | |
| Instruction Following | Dolly | Score71.3 | 18 | |
| Instruction Following Evaluation | Dolly Out-of-Distribution | GPT-4o Score49.9 | 17 | |
| Language Generation | Dolly databricks 15k (test) | ROUGE-L29.7 | 14 | |
| Machine Unlearning | Dolly-15k Mistral-7B variant (Seen) | Seen ASR86 | 14 | |
| Data Extraction | Dolly D2 | Mean Match Ratio49.2 | 11 | |
| Fine-tuning Robustness | Dolly Dataset | FSR92 | 10 | |
| Federated Learning | Dolly-15K | Speedup18.86 | 10 |