Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Robustness Evaluation on Perturbation Dataset
Loading...
62.56
Change Accuracy
L4L
-2.5024
14.3888
31.28
48.1712
Nov 26, 2025
Change Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Change Accuracy
L4L
2025.11
62.56
GPT-5.2
2025.11
59.63
DeepSeek v3
2025.11
55.93
Claude 4 Sonnet
2025.11
51.5
GPT-4o
2025.11
50.67
GPT o4-mini
2025.11
46.33
DISC-LawLLM
2025.11
23.17
LexiLaw
2025.11
0
Feedback
Search any
task
Search any
task