Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CS-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
General and STEM reasoningCS-Bench
Pass@173.96
20
Chart spatial understandingCS-Bench
R@0.345.3
8
Autonomous LLM Fine-tuningCS-Bench
Accuracy85.3
4
Showing 3 of 3 rows