Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multitask Language Understanding on MMMLU Korean 1.0 (test)
Loading...
41.94
Accuracy
CLO
22.7312
27.7181
32.705
37.6919
May 20, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
CLO
Base Model=Qwen2.5-3B,...
2025.05
41.94
CLO
Base Model=Llama-2-13B...
2025.05
39.7
SFT-tgt
Base Model=Llama-2-13B...
2025.05
36.8
SFT
Base Model=Qwen2.5-3B,...
2025.05
35.9
SFT
Base Model=Llama-2-13B...
2025.05
34.39
CLO
Base Model=Llama-3-8B,...
2025.05
32.73
SFT-tgt
Base Model=Llama-3-8B,...
2025.05
29.61
CLO
Base Model=Llama-2-7B,...
2025.05
29.09
CLO
Base Model=Mistral-7B,...
2025.05
28.31
SFT+DPO
Base Model=Llama-2-7B,...
2025.05
28
SFT-tgt
Base Model=Mistral-7B,...
2025.05
27.65
SFT+DPO
Base Model=Llama-3-8B,...
2025.05
27.48
SFT+DPO
Base Model=Llama-2-13B...
2025.05
26.79
SFT+DPO
Base Model=Mistral-7B,...
2025.05
26.77
SFT
Base Model=Mistral-7B,...
2025.05
25.94
SFT-tgt
Base Model=Llama-2-7B,...
2025.05
25.31
SFT
Base Model=Llama-3-8B,...
2025.05
25.31
SFT
Base Model=Llama-2-7B,...
2025.05
23.47
Feedback
Search any
task
Search any
task