Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Discrimination between Good Faith and Problematic agents (Summarization) on CNN/Daily 13.8:1
Loading...
5.87
Cohen's d
TVD-MI
0.3996
1.8198
3.24
4.6602
Aug 7, 2025
Cohen's d
Updated 1mo ago
Evaluation Results
Method
Method
Links
Cohen's d
TVD-MI
2025.08
5.87
Judge
Reference availability...
2025.08
3.55
GPPM
2025.08
3.42
MI (DoE)
2025.08
2.06
Judge
Reference availability...
2025.08
0.72
Baseline
2025.08
0.61
Feedback
Search any
task
Search any
task