Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM-as-a-Judge Evaluation on FLASK

0.518Pearson's r

TRACT

0.187280.273140.3590.44486Mar 6, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.03
0.5180.501
2025.03
0.5120.493
2025.03
0.5090.502
2025.03
0.5060.493
2025.03
0.50.493
2025.03
0.4750.484
2025.03
0.4680.436
2025.03
0.4480.437
2025.03
0.4350.433
2025.03
0.4180.419
2025.03
0.4130.407
2025.03
0.4120.445
2025.03
0.3580.346
2025.03
0.3550.361
2025.03
0.2280.168
2025.03
0.20.149