Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Assistant Response Alignment (Helpfulness and Harmlessness) on HH-RLHF (test)

89.42Helpfulness Win Rate

MetaAligner-7B

-87.8168-41.80344.2150.2234Oct 25, 2023Mar 2, 2024Jul 10, 2024Nov 17, 2024Mar 26, 2025Aug 3, 2025Dec 11, 2025
Updated 3d ago

Evaluation Results

MethodLinks
2024.03
89.42-85.1688.0887.55------
2024.03
85.5-76.58682.67------
2024.03
82.8-70.4868.9174.06------
2024.03
82-77.58380.83------
2024.03
80.7-73.8277.3977.3------
2024.03
80.32-77.182.6380.02------
2025.12
76---------0.42-0.23
2024.03
75-7276.574.5------
2024.03
75-66.570.1270.54------
2024.03
75-62.57771.5------
2023.10
68.28-72.19-70.4369.8571.02----
2023.10
67.89-71.32-68.9766.5368.92----
2023.10
67.05-71.66-68.4365.8967.95----
2023.10
66.49-70.62-68.4167.6768.5----
2023.10
64.56-73.13-67.6465.666.51----
2023.10
64.05-72.86-67.465.5666.44----
2024.03
64-657869------
2023.10
62.14-67.26-63.8560.4463.86----
2023.10
61.38-64.63-63.1263.2663.28----
2023.10
55.29-61.97-58.6559.7858.26----
2023.10
53.85-52.77-54.2655.355.43----
2024.03
51.2-62.8377.563.84------
2025.12
39--------0.220.17
2023.10
33.25-53.59-40.6740.4836.23----
2025.12
31--------0.170.23
2025.12
30--------0.190.18
2025.12
25--------0.150.11
2025.12
18--------0.090.11
2025.12
4---------0.090.08
2025.12
-79---------0.920.42
2025.12
-81--------0.53-0.4
2025.02
-67.07---------
2025.02
-62.35---------
2025.02
-59.12---------
2024.02
----43--1740--
2024.02
----44--1541--