Share your thoughts, 1 month free Claude Pro on usSee more

Explainability classification on WildGuardMix human-annotated (test)

60.69F1 Score

LEG base

Updated 4mo ago

Evaluation Results

Method	Links
LEG base 2026.01		60.69
LEG large 2026.01		58.39
GPT-4o-mini 2026.01		54.87