Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning and Knowledge Assessment on Xstorycloze bo
Loading...
72.96
Accuracy
Ours-MoE-SFT
47.5216
54.1258
60.73
67.3342
Jul 12, 2025
Accuracy
Updated 20d ago
Evaluation Results
Method
Method
Links
Accuracy
Ours-MoE-SFT
Model=Ours-MoE-SFT
2025.07
72.96
Ours-SFT
Model=Ours-SFT
2025.07
61.86
Ours-MoE-Base-8k
Model=Ours-MoE-Base-8k
2025.07
60.86
Ours-Base
Model=Ours-Base
2025.07
60.8
Ours-MoE-Base
Model=Ours-MoE-Base
2025.07
60.74
Ours-Base-32k
Model=Ours-Base-32k
2025.07
60.6
LLaMA3.1-8B-Instruct
Model=LLaMA3.1-8B-Inst...
2025.07
51.69
Qwen2.5-7B-Instruct
Model=Qwen2.5-7B-Instruct
2025.07
50.43
Qwen3-8B
Model=Qwen3-8B
2025.07
50.37
Qwen2.5-7B-base
Model=Qwen2.5-7B-base
2025.07
49.97
DeepSeek-R1-Distill-Llama-8B
Model=DeepSeek-R1-Dist...
2025.07
48.5
Feedback
Search any
task
Search any
task