Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-form QA on WaterBench (test)
Loading...
24.06
GM Score
SWEET
22.2712
22.7356
23.2
23.6644
Dec 18, 2025
GM Score
TPR
TNR
Updated 4d ago
Evaluation Results
Method
Method
Links
GM Score
TPR
TNR
SWEET
Model=OPT-1.3B
2025.12
24.06
1
1
Unbiased
Model=OPT-1.3B
2025.12
23.99
0.96
0.985
SynthID
Model=OPT-1.3B
2025.12
23.93
0.95
0.965
DIPmark
Model=OPT-1.3B
2025.12
23.92
0.885
0.91
Original
Model=OPT-1.3B
2025.12
23.87
-
-
KGW
Model=OPT-1.3B
2025.12
23.86
0.99
0.98
EWD
Model=OPT-1.3B
2025.12
23.69
1
1
DualGuard
Model=OPT-1.3B
2025.12
23.35
0.985
0.985
AAR
Model=OPT-1.3B
2025.12
23.19
0.985
0.98
XSIR
Model=OPT-1.3B
2025.12
22.52
0.9
0.84
SIR
Model=OPT-1.3B
2025.12
22.34
1
0.925
Feedback
Search any
task
Search any
task