Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CRUX

Benchmarks

Task NameDataset NameSOTA ResultTrend
CodeCRUX
Accuracy @555.08
27
Code ReasoningCRUX
Accuracy87.37
26
Code ReasoningCRUX official (test)
Pass@1 Accuracy51.1
20
Code GenerationCRUX
Score (%)57.2
18
Nugget-based retrievalCRUX Multi-News
Precision@1038
14
Nugget-based retrievalCRUX DUC04
Precision@1073.8
14
Showing 6 of 6 rows