Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Context Adherence on Representative guardrail dataset
Loading...
96
F1-Score
ChainPoll
40.88
55.19
69.5
83.81
Feb 20, 2026
F1-Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1-Score
ChainPoll
Model=GPT 4.1
2026.02
96
Luna-2
Model=Llama 3.2 3B
2026.02
95
Single token
Model=Llama 3.2 3B
2026.02
43
Feedback
Search any
task
Search any
task