Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Judge Agreement on Chatbot Arena Random = 50% (S2)
Loading...
96
Agreement
GPT-3.5 (Pairwise)
82.48
85.99
89.5
93.01
Jun 9, 2023
Agreement
Total Votes
Updated 4d ago
Evaluation Results
Method
Method
Links
Agreement
Total Votes
GPT-3.5 (Pairwise)
Target Judge=Claude (P...
2023.06
96
1,497
GPT-4 (Pairwise)
Target Judge=GPT-4 (Si...
2023.06
95
1,967
GPT-4 (Pairwise)
Target Judge=Claude (P...
2023.06
95
1,712
GPT-4 (Pairwise)
Target Judge=GPT-3.5 (...
2023.06
94
1,788
GPT-4 (Single-answer)
Target Judge=Claude (P...
2023.06
91
1,538
GPT-4 (Single-answer)
Target Judge=GPT-3.5 (...
2023.06
89
1,593
GPT-4 (Pairwise)
Target Judge=Human
2023.06
87
1,944
GPT-4 (Single-answer)
Target Judge=Human
2023.06
85
1,761
Claude (Pairwise)
Target Judge=Human
2023.06
84
1,475
GPT-3.5 (Pairwise)
Target Judge=Human
2023.06
83
1,567
Feedback
Search any
task
Search any
task