Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
LLM-as-a-Judge on KD-DTI (test)
Loading...
53.41
EM Change
GPT-4o-Mini
32.6828
38.0639
43.445
48.8261
Jun 1, 2025
EM Change
RMSE Change
Updated 4d ago
Evaluation Results
Method
Method
Links
EM Change
RMSE Change
GPT-4o-Mini
LLM-Generator Response...
2025.06
53.41
1.81
Qwen-2.5-7B-Instruct
LLM-Generator Response...
2025.06
49.98
1.82
Deepseek-R1-Qwen-7B
LLM-Generator Response...
2025.06
42.45
2.51
Gemini-Flash
LLM-Generator Response...
2025.06
40.68
1.98
Claude-3-Haiku
LLM-Generator Response...
2025.06
40.27
1.83
LLaMA-3.1-8B-Instruct
LLM-Generator Response...
2025.06
36.73
2.1
Phi-3.5-Mini-3.8B-Instruct
LLM-Generator Response...
2025.06
35.55
2.11
Deepseek-R1-LLaMA-8B
LLM-Generator Response...
2025.06
33.48
3.25
Feedback
Search any
task
Search any
task