Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Policy Enforcement on Dynamic Policy E-commerce Benchmark
Loading...
91
Recall
Qwen3-32B-Thinking
29.64
45.57
61.5
77.43
Jan 22, 2026
Recall
Precision
F1 Score
Updated 3mo ago
Evaluation Results
Method
Method
Links
Recall
Precision
F1 Score
Qwen3-32B-Thinking
Model Parameters=32B,...
2026.01
91
99
95
Qwen3-8B-Thinking
Model Parameters=8B, R...
2026.01
86
100
92
YuFeng-XGuard
Explicit Chain-of-Thou...
2026.01
84
99
91
Qwen3-32B-NoThinking
Model Parameters=32B,...
2026.01
72
92
81
GPT-OSS-SafeGuard-20B
Model Parameters=20B
2026.01
64
99
77
Qwen3-8B-NoThinking
Model Parameters=8B, R...
2026.01
32
99
49
Feedback
Search any
task
Search any
task