Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Automated auditing on BIXBench (Verified-50)
Loading...
83.3
Recall (A)
Ensemble (any)
31.3
44.8
58.3
71.8
Apr 27, 2026
Recall (A)
Recall (A+P)
Precision (A)
Precision (A+P)
Cost ($)
Findings Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Recall (A)
Recall (A+P)
Precision (A)
Precision (A+P)
Cost ($)
Findings Count
Ensemble (any)
Description=Pooled fin...
2026.04
83.3
95.8
-
-
14.38
383
Opus 4.6
2026.04
54.2
79.2
38.7
67.7
5.98
66
GPT-5.4
2026.04
50
87.5
23.3
55.8
1.92
102
Gemini 3.0 Flash
2026.04
45.8
95.8
33.3
83.3
0.53
114
Gemini 3.1 Pro
2026.04
37.5
58.3
47.1
76.5
2.31
43
Sonnet 4.6
2026.04
33.3
58.3
23.3
60
3.64
58
Feedback
Search any
task
Search any
task