Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Access Control on Synthetic D&D dataset (test)
Loading...
92
Accuracy
Static Filter
74.32
78.91
83.5
88.09
Dec 23, 2025
Accuracy
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
F1 Score
Static Filter
Backbone=Llama3.1-8B,...
2025.12
92
94
Static Filter
Backbone=Qwen2.5-14B,...
2025.12
91
93
Static Filter
Backbone=Phi4-14B, App...
2025.12
90
93
Static Filter
Backbone=DeepSeek-r1,...
2025.12
90
93
ARBITER
Backbone=Qwen2.5-14B,...
2025.12
85
89
ARBITER
Backbone=Phi4-14B, App...
2025.12
82
87
ARBITER
Backbone=DeepSeek-r1,...
2025.12
78
84
ARBITER
Backbone=Llama3.1-8B,...
2025.12
75
83
Feedback
Search any
task
Search any
task