Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
LLM Evaluation Agreement on MT-bench First Turn
Loading...
97
Agreement
G4-Pair
58.52
68.51
78.5
88.49
Jun 9, 2023
Agreement
Total Votes
Updated 4d ago
Evaluation Results
Method
Method
Links
Agreement
Total Votes
G4-Pair
Setup=S2 (R = 50%), Re...
2023.06
97
662
G4-Pair
Setup=S2 (R = 50%), Re...
2023.06
85
859
G4-Single
Setup=S2 (R = 50%), Re...
2023.06
85
739
Human
Setup=S2 (R = 50%), Re...
2023.06
81
479
G4-Pair
Setup=S1 (R = 33%), Re...
2023.06
70
1,138
G4-Pair
Setup=S1 (R = 33%), Re...
2023.06
66
1,343
Human
Setup=S1 (R = 33%), Re...
2023.06
63
521
G4-Single
Setup=S1 (R = 33%), Re...
2023.06
60
1,280
Feedback
Search any
task
Search any
task