Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Principle-based evaluation

Benchmarks

Task NameDataset NameSOTA ResultTrend
OverallPrinciple-based evaluation dataset
Average8.41
12
SteeringPrinciple-based evaluation dataset
G Score8.68
12
JudgmentPrinciple-based evaluation dataset
G Score8.45
12
Showing 3 of 3 rows