Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Large Language Model Evaluation on Math Specialized Target (test)
Loading...
49.7
Weighted Average Score
CAMEL
44.604
45.927
47.25
48.573
Mar 9, 2026
Weighted Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Weighted Average Score
CAMEL
Sampling Strategy=Hour...
2026.03
49.7
SODM
Sampling Strategy=Rect...
2026.03
49.4
DML
Sampling Strategy=Rect...
2026.03
47.9
Model-size agnostic
Sampling Strategy=Rect...
2026.03
44.8
Feedback
Search any
task
Search any
task