Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multitask Language Understanding on MMMLU Korean 1.0 (test)
Loading...
41.94
Accuracy
CLO
22.7312
27.7181
32.705
37.6919
May 20, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
CLO
Base Model=Qwen2.5-3B,...
2025.05
41.94
CLO
Base Model=Llama-2-13B...
2025.05
39.7
SFT-tgt
Base Model=Llama-2-13B...
2025.05
36.8
SFT
Base Model=Qwen2.5-3B,...
2025.05
35.9
SFT
Base Model=Llama-2-13B...
2025.05
34.39
CLO
Base Model=Llama-3-8B,...
2025.05
32.73
SFT-tgt
Base Model=Llama-3-8B,...
2025.05
29.61
CLO
Base Model=Llama-2-7B,...
2025.05
29.09
CLO
Base Model=Mistral-7B,...
2025.05
28.31
SFT+DPO
Base Model=Llama-2-7B,...
2025.05
28
SFT-tgt
Base Model=Mistral-7B,...
2025.05
27.65
SFT+DPO
Base Model=Llama-3-8B,...
2025.05
27.48
SFT+DPO
Base Model=Llama-2-13B...
2025.05
26.79
SFT+DPO
Base Model=Mistral-7B,...
2025.05
26.77
SFT
Base Model=Mistral-7B,...
2025.05
25.94
SFT-tgt
Base Model=Llama-2-7B,...
2025.05
25.31
SFT
Base Model=Llama-3-8B,...
2025.05
25.31
SFT
Base Model=Llama-2-7B,...
2025.05
23.47
Feedback
Search any
task
Search any
task