Share your thoughts, 1 month free Claude Pro on usSee more

Pairwise Comparison on LLMEval

0.5098Agreement

GPT-4

Updated 3mo ago

Evaluation Results

Method	Links
GPT-4 2023.11		0.5098	0.8471
CRITIQUELLM 2023.11		0.5072	0.8595
Mixtral-8x7B 2023.11		0.4804	0.7902
JudgeLM-13B 2023.11		0.4477	0.7582
Qwen-14B-Chat 2023.11		0.4281	0.6961
ChatGPT 2023.11		0.4007	0.6458
Llama-2-70B-Chat 2023.11		0.4	0.685
ChatGLM3-6B 2023.11		0.2856	0.517
AUTO-J-Bilingual-6B 2023.11		0.2758	0.5556
Baichuan2-13B-Chat 2023.11		0.2353	0.4327