| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Hallucination Detection | Legal | AUROC97 | 24 | |
| Multi-hop Question Answering | Legal | F1 Score71.93 | 14 | |
| Retrieval | Legal | Legal Score51.16 | 10 | |
| Summarization | Legal (OOV_RS) | R-LCS24.86 | 8 | |
| Summarization | Legal (OOV_SD) | R-LCS25.16 | 8 | |
| Summarization | Legal (Random subset) | FrSrSD1.06 | 6 | |
| Misaligned Task Learning | Legal In-domain | Misalignment0.87 | 6 | |
| Emergent Misalignment Measurement | Legal | Misalignment0.58 | 6 | |
| Classification | Legal | Coverage Loss1 | 5 | |
| Grammar Checking | Legal (in-house) | Precision95.2 | 5 | |
| Private Information Tagging | Legal (test) | Precision78.72 | 4 | |
| Cross-domain generalization | Legal (test) | Accuracy100 | 3 | |
| Legal Prediction | Legal | BS0.228 | 3 | |
| Summarization | Legal (Random) | R-LCS25.42 | 2 |