Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Problem list generation on Problem List
Loading...
35.61
Composite Score
SLERP Merge
18.3876
22.8588
27.33
31.8012
Apr 2, 2026
Composite Score
Updated 16d ago
Evaluation Results
Method
Method
Links
Composite Score
SLERP Merge
Evaluation protocol=SFT
2026.04
35.61
GatorTronLlama_SFT
Evaluation protocol=SFT
2026.04
33.85
GatorTronLlama
Evaluation protocol=SFT
2026.04
33.07
Llama-3.1-8B-Instruct
Evaluation protocol=SFT
2026.04
33.05
Linear Merge
Evaluation protocol=SFT
2026.04
31.47
SLERP Merge
Evaluation protocol=0-...
2026.04
23.4
GatorTronLlama_SFT
Evaluation protocol=0-...
2026.04
21.53
Linear Merge
Evaluation protocol=0-...
2026.04
21.17
GatorTronLlama
Evaluation protocol=0-...
2026.04
20.84
Llama-3.1-8B-Instruct
Evaluation protocol=0-...
2026.04
19.05
Feedback
Search any
task
Search any
task