Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM-as-a-judge evaluation on FB Bench (Feedback Bench)

0.932Pearson's r

Mistral-7B-Instruct (RAFT on GPT-4)

0.16760.366050.56450.76295Mar 6, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.03
0.9320.93
2025.03
0.9310.93
2025.03
0.920.918
2025.03
0.920.917
2025.03
0.9190.917
2025.03
0.890.891
2025.03
0.8790.88
2025.03
0.8730.873
2025.03
0.8720.872
2025.03
0.8570.857
2025.03
0.8450.847
2025.03
0.8350.834
2025.03
0.6830.689
2025.03
0.6740.684
2025.03
0.3810.376
2025.03
0.1970.175