Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Legal

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionLegal
AUROC97
24
Multi-hop Question AnsweringLegal
F1 Score71.93
14
RetrievalLegal
Legal Score51.16
10
SummarizationLegal (OOV_RS)
R-LCS24.86
8
SummarizationLegal (OOV_SD)
R-LCS25.16
8
SummarizationLegal (Random subset)
FrSrSD1.06
6
Misaligned Task LearningLegal In-domain
Misalignment0.87
6
Emergent Misalignment MeasurementLegal
Misalignment0.58
6
ClassificationLegal
Coverage Loss1
5
Grammar CheckingLegal (in-house)
Precision95.2
5
Private Information TaggingLegal (test)
Precision78.72
4
Cross-domain generalizationLegal (test)
Accuracy100
3
Legal PredictionLegal
BS0.228
3
SummarizationLegal (Random)
R-LCS25.42
2
Showing 14 of 14 rows