Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Large Language Model Evaluation on Knowledge Specialized Target (test)
Loading...
56.5
Weighted Average Score
CAMEL
54.94
55.345
55.75
56.155
Mar 9, 2026
Weighted Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Weighted Average Score
CAMEL
Sampling Strategy=Hour...
2026.03
56.5
SODM
Sampling Strategy=Rect...
2026.03
55.9
Model-size agnostic
Sampling Strategy=Rect...
2026.03
55.1
DML
Sampling Strategy=Rect...
2026.03
55
Feedback
Search any
task
Search any
task