Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HOWTOBENCH

Benchmarks

Task NameDataset NameSOTA ResultTrend
CompletionHOWTOBENCH
Pearson Correlation (ρ)0.91
26
OverallHOWTOBENCH
Pearson Correlation (ρ)0.93
13
GuideHOWTOBENCH
Pearson Correlation (ρ)0.85
13
Pearson Correlation with Human EvaluationHOWTOBENCH English
Completion Correlation0.85
5
Showing 4 of 4 rows