Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Pairwise Comparison on AUTO-J Eval-P

62.28Agreement

GPT-4

12.224825.219938.21551.2101Nov 30, 2023
Updated 4d ago

Evaluation Results

MethodLinks
2023.11
62.2886.28
2023.11
50.9382.76
2023.11
49.4377.23
2023.11
42.7462.43
2023.11
35.252.66
2023.11
35.1358.19
2023.11
33.6256.9
2023.11
31.6852.08
2023.11
19.432.33
2023.11
14.1526.22