Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NUMINA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningNumina
Accuracy45.5
36
Mathematical ReasoningNumina
Accuracy (Numina)48.1
12
Response Similarity EvaluationNUMINA
C-Score1.848
9
Showing 3 of 3 rows