Share your thoughts, 1 month free Claude Pro on usSee more

LLM-as-a-Judge on KD-DTI (test)

53.41EM Change

GPT-4o-Mini

Updated 3mo ago

Evaluation Results

Method	Links
GPT-4o-Mini 2025.06		53.41	1.81
Qwen-2.5-7B-Instruct 2025.06		49.98	1.82
Deepseek-R1-Qwen-7B 2025.06		42.45	2.51
Gemini-Flash 2025.06		40.68	1.98
Claude-3-Haiku 2025.06		40.27	1.83
LLaMA-3.1-8B-Instruct 2025.06		36.73	2.1
Phi-3.5-Mini-3.8B-Instruct 2025.06		35.55	2.11
Deepseek-R1-LLaMA-8B 2025.06		33.48	3.25