Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Failure Mode Prediction on ATBench
Loading...
41
Accuracy
TRACES
5.432
14.666
23.9
33.134
May 26, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
TRACES
Base Model=Llama3.1-8B...
2026.05
41
TRACES
Base Model=Qwen3-4B
2026.05
34.2
AgentDoG
Base Model=Qwen3-4B
2026.05
28.8
AgentDoG
Base Model=Llama3.1-8B...
2026.05
26
Gemini-3-Flash
Base Model=Original
2026.05
22.4
GPT-5.2
Base Model=Original
2026.05
20.4
Gemini-3-Pro
Base Model=Original
2026.05
17.6
QWQ-32B
Base Model=Original
2026.05
14.4
Qwen3-4B
Base Model=Original
2026.05
8.6
Llama3.1-8B
Base Model=Original
2026.05
6.8
Feedback
Search any
task
Search any
task