Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning & Coding on WaterBench (test)
Loading...
59.82
GM
AAR
46.5912
50.0256
53.46
56.8944
Dec 18, 2025
GM
TPR
TNR
Updated 4d ago
Evaluation Results
Method
Method
Links
GM
TPR
TNR
AAR
Model=OPT-1.3B
2025.12
59.82
0.78
0.695
Original
Model=OPT-1.3B
2025.12
51.19
-
-
SynthID
Model=OPT-1.3B
2025.12
50.85
0.985
0.545
SWEET
Model=OPT-1.3B
2025.12
50.04
0.825
0.88
DualGuard
Model=OPT-1.3B
2025.12
49.88
0.94
0.655
Unbiased
Model=OPT-1.3B
2025.12
49.4
0.81
0.64
DIPmark
Model=OPT-1.3B
2025.12
49.27
0.935
0.5
EWD
Model=OPT-1.3B
2025.12
49.24
0.835
0.935
KGW
Model=OPT-1.3B
2025.12
48.84
0.885
0.8
XSIR
Model=OPT-1.3B
2025.12
48.7
0.81
0.61
SIR
Model=OPT-1.3B
2025.12
47.1
0.97
0.735
Feedback
Search any
task
Search any
task