Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pairwise Comparison on LLMEval
Loading...
0.5098
Agreement
GPT-4
0.22432
0.298435
0.37255
0.446665
Nov 30, 2023
Agreement
Consistency
Updated 4d ago
Evaluation Results
Method
Method
Links
Agreement
Consistency
GPT-4
Setting=Reference-Free
2023.11
0.5098
0.8471
CRITIQUELLM
Setting=Reference-Free
2023.11
0.5072
0.8595
Mixtral-8x7B
Setting=Reference-Free
2023.11
0.4804
0.7902
JudgeLM-13B
Setting=Reference-Free
2023.11
0.4477
0.7582
Qwen-14B-Chat
Setting=Reference-Free
2023.11
0.4281
0.6961
ChatGPT
Setting=Reference-Free
2023.11
0.4007
0.6458
Llama-2-70B-Chat
Setting=Reference-Free
2023.11
0.4
0.685
ChatGLM3-6B
Setting=Reference-Free
2023.11
0.2856
0.517
AUTO-J-Bilingual-6B
Setting=Reference-Free
2023.11
0.2758
0.5556
Baichuan2-13B-Chat
Setting=Reference-Free
2023.11
0.2353
0.4327
Feedback
Search any
task
Search any
task