Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Legal

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop Question AnsweringLegal
F1 Score71.93
14
RetrievalLegal
Legal Score51.16
10
Misaligned Task LearningLegal In-domain
Misalignment0.87
6
Emergent Misalignment MeasurementLegal
Misalignment0.58
6
Grammar CheckingLegal (in-house)
Precision95.2
5
Private Information TaggingLegal (test)
Precision78.72
4
Cross-domain generalizationLegal (test)
Accuracy100
3
Legal PredictionLegal
BS0.228
3
Showing 8 of 8 rows