Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Discrimination between Good Faith and Problematic agents (Summarization) on XSum 18.5:1
Loading...
6.69
Cohen's d
TVD-MI
-0.5588
1.3231
3.205
5.0869
Aug 7, 2025
Cohen's d
Updated 1mo ago
Evaluation Results
Method
Method
Links
Cohen's d
TVD-MI
2025.08
6.69
Judge
Reference availability...
2025.08
3.39
GPPM
2025.08
2.85
MI (DoE)
2025.08
1.89
Baseline
2025.08
0.29
Judge
Reference availability...
2025.08
-0.28
Feedback
Search any
task
Search any
task