Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Language Capability on K&R, IFEval-PT, HumanEval
Loading...
53.64
Average Score
Tucano2-qwen-3.7B-Instruct
16.3664
26.0432
35.72
45.3968
Mar 3, 2026
Average Score
Updated 3mo ago
Evaluation Results
Method
Method
Links
Average Score
Tucano2-qwen-3.7B-Instruct
Variant=Instruct, Para...
2026.03
53.64
Jurema-7B
Variant=Instruct, Para...
2026.03
53.03
Qwen2.5-3B-Instruct
Variant=Instruct, Para...
2026.03
51.71
Qwen3-4B
Variant=Instruct, Para...
2026.03
51.36
Gemma-3-Gaia-PT-BR-4b-it
Variant=Instruct, Para...
2026.03
49.93
SmolLM3-3B
Variant=Instruct, Para...
2026.03
49.54
Llama-3.2-3B-Instruct
Variant=Instruct, Para...
2026.03
45.82
Qwen2.5-1.5B-Instruct
Variant=Instruct, Para...
2026.03
41.39
Tucano2-qwen-1.5B-Instruct
Variant=Instruct, Para...
2026.03
37.54
Qwen3-1.7B
Variant=Instruct, Para...
2026.03
36.3
Tucano2-qwen-0.5B-Instruct
Variant=Instruct, Para...
2026.03
26.08
Qwen3-0.6B
Variant=Instruct, Para...
2026.03
22.21
Llama-3.2-1B-Instruct
Variant=Instruct, Para...
2026.03
20.14
Qwen2.5-0.5B-Instruct
Variant=Instruct, Para...
2026.03
17.8
Feedback
Search any
task
Search any
task