Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Frontier

Benchmarks

Task NameDataset NameSOTA ResultTrend
Open-ended Computer Science Problem SolvingFrontier-CS
Mean Score61.33
5
Large Model Performance PredictionFrontier Top-20 pattern shift
RMSE9.71
3
Algorithm EngineeringFrontier-CS
Median Score75.5
2
Showing 3 of 3 rows