Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning and Knowledge Assessment on ARC bo
Loading...
53.67
Accuracy
Ours-MoE-SFT
27.462
34.266
41.07
47.874
Jul 12, 2025
Accuracy
Updated 20d ago
Evaluation Results
Method
Method
Links
Accuracy
Ours-MoE-SFT
Model=Ours-MoE-SFT
2025.07
53.67
Ours-SFT
Model=Ours-SFT
2025.07
48.39
Ours-MoE-Base-8k
Model=Ours-MoE-Base-8k
2025.07
45.79
Ours-MoE-Base
Model=Ours-MoE-Base
2025.07
44.96
Ours-Base-32k
Model=Ours-Base-32k
2025.07
44.62
Ours-Base
Model=Ours-Base
2025.07
44.32
DeepSeek-R1-Distill-Llama-8B
Model=DeepSeek-R1-Dist...
2025.07
30.38
Qwen2.5-7B-base
Model=Qwen2.5-7B-base
2025.07
29.6
LLaMA3.1-8B-Instruct
Model=LLaMA3.1-8B-Inst...
2025.07
29.31
Qwen3-8B
Model=Qwen3-8B
2025.07
29.16
Qwen2.5-7B-Instruct
Model=Qwen2.5-7B-Instruct
2025.07
28.47
Feedback
Search any
task
Search any
task