Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Three runs

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning Quality AssessmentThree runs 1-5 scale (blind evaluation)
Recursion Depth4.6
2
Showing 1 of 1 rows