Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Discrimination between Good Faith and Problematic agents on PubMed 6.7:1
Loading...
8.14
Cohen's d
Judge
0.5688
2.5344
4.5
6.4656
Aug 7, 2025
Cohen's d
Updated 1mo ago
Evaluation Results
Method
Method
Links
Cohen's d
Judge
Reference availability...
2025.08
8.14
TVD-MI
2025.08
6.53
Judge
Reference availability...
2025.08
3.25
GPPM
2025.08
3.18
MI (DoE)
2025.08
2.01
Baseline
2025.08
0.86
Feedback
Search any
task
Search any
task