Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Large Language Model Evaluation on Code Specialized Target (test)
Loading...
52.8
Weighted Average Score
CAMEL
49.576
50.413
51.25
52.087
Mar 9, 2026
Weighted Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Weighted Average Score
CAMEL
Sampling Strategy=Hour...
2026.03
52.8
SODM
Sampling Strategy=Rect...
2026.03
52
DML
Sampling Strategy=Rect...
2026.03
50
Model-size agnostic
Sampling Strategy=Rect...
2026.03
49.7
Feedback
Search any
task
Search any
task