Share your thoughts, 1 month free Claude Pro on usSee more

Safety Evaluation on Safety Tweet Eval, Hatecheck, Ethos (test)

83.3Accuracy

Context Distill.

Updated 4mo ago

Evaluation Results

Method	Links
Context Distill. 2026.02		83.3
OPCD 2026.02		83.1
OPCD 2026.02		79.6
OPCD 2026.02		78.1
Context Distill. 2026.02		77.2
Context Distill. 2026.02		77
In-Context 2026.02		75.3
In-Context 2026.02		72.7
Base Model 2026.02		70.7
In-Context 2026.02		69.5
Base Model 2026.02		69.1
Base Model 2026.02		30.7