Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
LLM-as-a-Judge on BC5CDR (test)
Loading...
48.35
EM
GPT-4o-Mini
28.694
33.797
38.9
44.003
Jun 1, 2025
EM
RMSE
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
RMSE
GPT-4o-Mini
LLM-Generator Response...
2025.06
48.35
2.33
Qwen-2.5-7B-Instruct
LLM-Generator Response...
2025.06
45.25
2.42
Gemini-Flash
LLM-Generator Response...
2025.06
42.55
2.09
Phi-3.5-Mini-3.8B-Instruct
LLM-Generator Response...
2025.06
33.8
2.4
Deepseek-R1-Qwen-7B
LLM-Generator Response...
2025.06
30.6
2.76
Deepseek-R1-LLaMA-8B
LLM-Generator Response...
2025.06
30.5
3.37
Claude-3-Haiku
LLM-Generator Response...
2025.06
29.5
2.26
LLaMA-3.1-8B-Instruct
LLM-Generator Response...
2025.06
29.45
2.4
Feedback
Search any
task
Search any
task