Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM-as-a-Judge on DDI (test)

59.03EM (Δ)

GPT-4o-Mini

28.131636.153344.17552.1967Jun 1, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.06
59.031.84
2025.06
47.122.11
2025.06
46.62.15
2025.06
43.062.19
2025.06
42.673.07
2025.06
42.154.16
2025.06
31.152.7
2025.06
29.322.95