Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM-as-a-judge evaluation on Vicuna Bench

0.605Pearson Correlation (r)

TRACT

0.213960.315480.4170.51852Mar 6, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.03
0.6050.65
2025.03
0.5930.552
2025.03
0.5670.519
2025.03
0.5620.526
2025.03
0.5280.513
2025.03
0.5090.541
2025.03
0.5050.477
2025.03
0.4880.48
2025.03
0.4850.487
2025.03
0.4670.483
2025.03
0.4630.456
2025.03
0.4290.414
2025.03
0.4180.404
2025.03
0.40.423
2025.03
0.2810.165
2025.03
0.2290.311