Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Risk Source Prediction on ATBench
Loading...
52
Accuracy
AgentDoG
2.288
15.194
28.1
41.006
May 26, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
AgentDoG
Base Model=Qwen3-4B
2026.05
52
AgentDoG
Base Model=Llama3.1-8B...
2026.05
51.6
TRACES
Base Model=Llama3.1-8B...
2026.05
50
TRACES
Base Model=Qwen3-4B
2026.05
48.8
GPT-5.2
Base Model=Original
2026.05
41.6
Gemini-3-Flash
Base Model=Original
2026.05
38
Gemini-3-Pro
Base Model=Original
2026.05
36.8
QWQ-32B
Base Model=Original
2026.05
23.2
Llama3.1-8B
Base Model=Original
2026.05
6.2
Qwen3-4B
Base Model=Original
2026.05
4.2
Feedback
Search any
task
Search any
task