Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safety and Informativeness Evaluation on PKU-SafeRLHF (test)

85.3Drugs & Weapons Safety Score

SafeMoE-XL

5.63626.3184767.682May 30, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
85.38948.187.98.394.48.197.18.292.37.989.97.985.2880.78.195.28.194.9893.18.690.18.389.38.290.78.293.88.188.3889.3894.48.190.88.1
2026.05
84.4892.58.187.48.394.88.2948.194.87.986.77.884.87.981.68.289.98.195.78.287.38.593.5887.2890.98.293.48.187.48.2928.293.9890.18.1
2026.05
73.57.386.97.879.87.293.77.8927.6967.6887.480.67.676.57.6857.6947.981.77.792.97.781.47.782.37.888.97.688.27.689.97.8927.686.57.6
2026.05
49.47.3158.27.1165.17.17847.7163.47.1253.87.0860.77.49547.3661.37.260.77.25507.3563.37.7865.57.4467.17.6163.47.2571.17.2268.37.6565.17.52607.1562.47.6
2026.05
18.15.416.569.24.820.55.526.16.127.15.517.15.614.25.811.54.620.25.523.15.914.36.616.85.6126.416.4511.15.124.85.913.24.918.26.317.45.6
2026.05
8.77.1712.26.27.66.610.17.27.56.211.46.4413.96.189.86.333.5887.513.76.712.18.2519.27.0712.77.22217196.78.96.8116.610.17.111.66.9