Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Human Preference Prediction on Chatbot Arena latest (test)

72.18Accuracy

SynthesizeMe (+ FT RM + Personas)

49.112855.101461.0967.0786Oct 14, 2024Nov 22, 2024Dec 31, 2024Feb 8, 2025Mar 19, 2025Apr 27, 2025Jun 5, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.06
72.18
2025.06
72.18
2025.06
71.48
2024.10
70.72
2024.10
70.66
2024.10
70.65
2024.10
70.64
2024.10
70.62
2024.10
70.61
2024.10
70.53
2024.10
70.38
2025.06
69.72
2025.06
69.01
2025.06
69.01
2025.06
68.31
2025.06
67.25
2025.06
66.55
2024.10
64.71
2024.10
64.43
2024.10
64.22
2024.10
64.17
2024.10
64.06
2024.10
63.93
2024.10
63.77
2024.10
63.24
2025.06
61.97
2025.06
61.97
2025.06
61.62
2025.06
60.56
2025.06
59.15
2025.06
58.1
2025.06
58.1
2025.06
57.92
2025.06
57.57
2025.06
57.39
2025.06
56.69
2025.06
56.69
2025.06
56.69
2025.06
56.69
2025.06
55.11
2025.06
54.93
2025.06
54.75
2025.06
54.23
2025.06
53.87
2025.06
53.7
2025.06
53.7
2025.06
53.17
2025.06
52.46
2025.06
52.29
2025.06
50.88
2025.06
50