Share your thoughts, 1 month free Claude Pro on usSee more

LLM-as-a-judge evaluation on 100 Romanian synthetic prompts (test)

4.71Fluency

DeepSeek (Instruct)

Updated 5mo ago

Evaluation Results

Method	Links
DeepSeek (Instruct) 2026.01		4.71	4.66	4.68
LLaMA-3 8B Instruct 2026.01		4.7	4.72	4.71
GPT-4.1-mini 2026.01		4.63	4.77	4.7
EuroLLM 9B Instruct 2026.01		4.58	4.66	4.62
Gemma 2 9B Instruct 2026.01		4.46	4.58	4.52
TF3 Transformer 2026.01		4.28	4.09	4.19
TF3 Distilled Student 2026.01		3.96	4.03	4