Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent Behavioral Safety on AgentHarm

90.6Safety Rate

Thought-Aligner-7B

40.88853.79466.779.606May 16, 2025
Updated 7d ago

Evaluation Results

MethodLinks
2025.05
90.630
2025.05
90.439.8
2025.05
89.336.5
2025.05
88.834
2025.05
88.733.2
2025.05
8850.6
2025.05
8746
2025.05
86.668.1
2025.05
81.351.2
2025.05
80.953.4
2025.05
64.241.9
2025.05
63.454.8
2025.05
61.884
2025.05
42.885