Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Human Preference Prediction on Chatbot Arena latest (test)

72.18Accuracy

SynthesizeMe (+ FT RM + Personas)

49.112855.101461.0967.0786Oct 14, 2024Nov 22, 2024Dec 31, 2024Feb 8, 2025Mar 19, 2025Apr 27, 2025Jun 5, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.06
72.18
2025.06
72.18
2025.06
71.48
2024.10
70.72
2024.10
70.66
2024.10
70.65
2024.10
70.64
2024.10
70.62
2024.10
70.61
2024.10
70.53
2024.10
70.38
2025.06
69.72
2025.06
69.01
2025.06
69.01
2025.06
68.31
2025.06
67.25
2025.06
66.55
2024.10
64.71
2024.10
64.43
2024.10
64.22
2024.10
64.17
2024.10
64.06
2024.10
63.93
2024.10
63.77
2024.10
63.24
2025.06
61.97
2025.06
61.97
2025.06
61.62
2025.06
60.56
2025.06
59.15
2025.06
58.1
2025.06
58.1
2025.06
57.92
2025.06
57.57
2025.06
57.39
2025.06
56.69
2025.06
56.69
2025.06
56.69
2025.06
56.69
2025.06
55.11
2025.06
54.93
2025.06
54.75
2025.06
54.23
2025.06
53.87
2025.06
53.7
2025.06
53.7
2025.06
53.17
2025.06
52.46
2025.06
52.29
2025.06
50.88
2025.06
50