Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Refusal Detection on Do-Not-Answer Portuguese (test)
Loading...
100
Accuracy
gov.pt baseline
83.36
87.68
92
96.32
Mar 2, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
gov.pt baseline
2026.03
100
Gervásio 70B
RAG=true, Model Size=70B
2026.03
98
Gervásio 8B
RAG=true, Model Size=8B
2026.03
97
Llama 3.3 70B
Model Size=70B
2026.03
92
Qwen 32B
Model Size=32B
2026.03
90
Gervásio 70B
RAG=false, Model Size=70B
2026.03
87
Mistral 24B
Model Size=24B
2026.03
86
Gervásio 8B
RAG=false, Model Size=8B
2026.03
84
Llama 3.1 8B
Model Size=8B
2026.03
84
Feedback
Search any
task
Search any
task