Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Judge Agreement on Chatbot Arena Random = 33% (S1)
Loading...
72
Agreement Rate
GPT-4 (Pairwise)
52.24
57.37
62.5
67.63
Jun 9, 2023
Agreement Rate
Vote Count
Updated 4d ago
Evaluation Results
Method
Method
Links
Agreement Rate
Vote Count
GPT-4 (Pairwise)
Target Judge=GPT-4 (Si...
2023.06
72
2,968
GPT-3.5 (Pairwise)
Target Judge=Claude (P...
2023.06
68
3,057
GPT-4 (Pairwise)
Target Judge=GPT-3.5 (...
2023.06
66
3,061
GPT-4 (Pairwise)
Target Judge=Claude (P...
2023.06
66
3,062
GPT-4 (Pairwise)
Target Judge=Human
2023.06
64
3,066
GPT-4 (Single-answer)
Target Judge=Claude (P...
2023.06
62
2,964
GPT-4 (Single-answer)
Target Judge=GPT-3.5 (...
2023.06
60
2,964
GPT-4 (Single-answer)
Target Judge=Human
2023.06
60
2,968
GPT-3.5 (Pairwise)
Target Judge=Human
2023.06
54
3,061
Claude (Pairwise)
Target Judge=Human
2023.06
53
3,062
Feedback
Search any
task
Search any
task