Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Discrimination between Good Faith and Problematic agents on WMT14 1.1:1
Loading...
3.32
Cohen's d
TVD-MI
0.1168
0.9484
1.78
2.6116
Aug 7, 2025
Cohen's d
Updated 1mo ago
Evaluation Results
Method
Method
Links
Cohen's d
TVD-MI
2025.08
3.32
Judge
Reference availability...
2025.08
2.53
MI (DoE)
2025.08
1.61
Baseline
2025.08
0.93
GPPM
2025.08
0.7
Judge
Reference availability...
2025.08
0.24
Feedback
Search any
task
Search any
task