Share your thoughts, 1 month free Claude Pro on usSee more

LLM-as-a-Judge on DDI (test)

59.03EM (Δ)

GPT-4o-Mini

Updated 3mo ago

Evaluation Results

Method	Links
GPT-4o-Mini 2025.06		59.03	1.84
Gemini-Flash 2025.06		47.12	2.11
Qwen-2.5-7B-Instruct 2025.06		46.6	2.15
Phi-3.5-Mini-3.8B-Instruct 2025.06		43.06	2.19
Deepseek-R1-Qwen-7B 2025.06		42.67	3.07
Deepseek-R1-LLaMA-8B 2025.06		42.15	4.16
Claude-3-Haiku 2025.06		31.15	2.7
LLaMA-3.1-8B-Instruct 2025.06		29.32	2.95