Share your thoughts, 1 month free Claude Pro on usSee more

Long-term state poisoning evaluation on OpenClaw Average across conversation variants

4.35Harm Score (HS)

Grok-1

Updated 2mo ago

Evaluation Results

Method	Links
Grok-1 2026.05		4.35
Kimi k2.5 2026.05		4.42
GPT-4o 2026.05		5.53
MiniMax-abab6.5 2026.05		6.42