Share your thoughts, 1 month free Claude Pro on usSee more

Pairwise Comparison on AUTO-J Eval-P

62.28Agreement

GPT-4

Updated 5mo ago

Evaluation Results

Method	Links
GPT-4 2023.11		62.28	86.28
CRITIQUELLM 2023.11		50.93	82.76
AUTO-J-Bilingual-6B 2023.11		49.43	77.23
ChatGPT 2023.11		42.74	62.43
Mixtral-8x7B 2023.11		35.2	52.66
JudgeLM-13B 2023.11		35.13	58.19
Llama-2-70B-Chat 2023.11		33.62	56.9
Qwen-14B-Chat 2023.11		31.68	52.08
Baichuan2-13B-Chat 2023.11		19.4	32.33
ChatGLM3-6B 2023.11		14.15	26.22