Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reward Modeling on RewardBench Safety Subset Perturbations 2

-0.629LE Score

Llama3-8B-IDRM

-0.69864-0.228570.24150.71157Nov 30, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.11
-0.629-0.7-0.032-0.119-0.73
2025.11
-0.090.0070.231-0.181.041
2025.11
-0.052-0.031-0.055-0.001-0.09
2025.11
0.0020.1480.1911.201-0.004
2025.11
0.029-0.015-0.003-0.0270.016
2025.11
0.1070.0460.0020.0070.004
2025.11
0.1080.2970.1971.025-0.082
2025.11
0.1270.0810.1180.1090.113
2025.11
0.2190.1640.0730.0770.178
2025.11
0.3440.1790.1691.47-0.037
2025.11
0.388-0.2970.8211.0161.066
2025.11
0.5020.567-0.0030.0920.613
2025.11
0.5350.5580.130.020.525
2025.11
0.5370.5480.220.2230.669
2025.11
0.5370.5820.250.1540.58
2025.11
0.5430.5580.120.1270.628
2025.11
0.5710.6050.0620.0570.668
2025.11
0.6040.6410.2210.2920.809
2025.11
0.6220.6680.1440.2770.875
2025.11
0.650.6640.0850.1460.732
2025.11
0.6940.7280.3380.4120.952
2025.11
0.7010.6630.2420.4270.873
2025.11
0.7260.7710.0590.1070.791
2025.11
0.7840.7980.2880.320.961
2025.11
0.9280.930.2010.3591.039
2025.11
1.1121.050.5630.7251.264