| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Domain-specific Reasoning | LegalBench | Accuracy85.26 | 33 | |
| Legal Reasoning | LegalBench Hearsay | Accuracy86.46 | 16 | |
| Retrieval | LegalBench CorporateLobbying | nDCG@1093.56 | 12 | |
| Retrieval | LegalBench RAG | Hit Rate@1996 | 11 | |
| Legal Reasoning | LegalBench Learned Hands Courts | Accuracy75.5 | 10 | |
| Legal Reasoning | LegalBench | Balanced Accuracy79.3 | 10 | |
| Question-Type Diversity Alignment | Legalbench Taxonomy | Jensen Shannon Divergence0.036 | 8 | |
| In-Context Learning | LegalBench | Accuracy79.5 | 6 | |
| Legal Reasoning | LegalBench CUAD Cardlytics Buffalo Wild Wings PF Hospitality 2023 | Accuracy (Cardl)82.7 | 6 | |
| Retrieval | LegalBench EN | nDCG@1063.42 | 5 | |
| Generation | LegalBench Rule-Application | Exact Match59 | 4 | |
| Classification | LegalBench Interpretation | Accuracy69.7 | 4 | |
| Cross-lingual Question Answering | LEGALBENCH RuleQA English (test) | ROUGE-120.25 | 3 | |
| TG task | Legalbench | Warranty Duration (CUAD)61 | 3 |