Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Law

Benchmarks

Task NameDataset NameSOTA ResultTrend
Law reasoningLaw
Accuracy70.25
27
Speculative DecodingLaw
Throughput (tokens/s)132.69
22
Counterfactual ExplanationsLaw
Validity100
18
Legal ReasoningLaw
LLM-as-judge Score34.4
13
Legal ReasoningLaw
Score26.52
13
Machine TranslationLaw (test)
BLEU61.55
9
Multiple Choice Question AnsweringLaw
Accuracy45.4
8
Multi-class ClassificationLaw
Accuracy69.1
8
Machine TranslationLaw (Ko-En) (test)
BLEU53.8
8
Machine TranslationLaw De-En (test)
BLEU75.44
8
Machine TranslationLaw All-domain datastore (test)
BLEU61.22
6
Legal ReasoningLaw (test)
Score45.29
5
Machine TranslationLaw En-De out-of-domain WMT14 (test)
BLEU Score41.5
5
Within-distribution detectionLaw Unsafe vs. Safe v3/v4 (holdout)
AUROC0.971
2
Machine TranslationLaw multi-domain (test)
Decoding Speed (Tok/Sec)3,690.85
2
Showing 15 of 15 rows