Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Ultra-Problem

Benchmarks

Task NameDataset NameSOTA ResultTrend
Judge AccuracyUltra-Problem (Target)
Accuracy61.5
2
Judge AccuracyUltra-Problem (Bench)
Accuracy73
2
Showing 2 of 2 rows