| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| CaseHOLD (test) | AutoAdapt | Test Accuracy89.22 | 22 | 1mo ago | |
| LegalBench Hearsay | SciDC (Qwen3-14B) | Accuracy86.46 | 16 | 9d ago | |
| LexEval | Kimi-K2-thinking | Memoization63.78 | 15 | 1mo ago | |
| LegalArg | Qwen3-4B-Instruct | Accuracy65.42 | 14 | 1mo ago | |
| Law | SeqTopK | Score26.52 | 13 | 20d ago | |
| BarExam MBE (test) | Accuracy82.1 | 12 | 26d ago | ||
| LexEval | Qwen2.5-7B + PPO w/ VERL. | Pass@157.1 | 10 | 4d ago | |
| LEXam | Qwen2.5-7B + PPO w/ VERL. | Pass@1 Accuracy23.4 | 10 | 4d ago | |
| LegalBench Learned Hands Courts | MAD | Accuracy75.5 | 10 | 1mo ago | |
| LegalBench | Llama3.1-70B | Balanced Accuracy79.3 | 10 | 1mo ago | |
| LawBench | MCE | Micro-F170 | 10 | 1mo ago | |
| CaseHold | AutoAdapt | Cumulative Score (CS)96 | 8 | 1mo ago | |
| LegalBench CUAD Cardlytics Buffalo Wild Wings PF Hospitality 2023 | Agentic Adversarial QA | Accuracy (Cardl)82.7 | 6 | 1mo ago | |
| Law (test) | SeqTopK | Score45.29 | 5 | 20d ago | |
| LSAT (test) | RADAR | Hypervolume0.9188 | 4 | 1mo ago |