Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Discrimination between Good Faith and Problematic agents (Translation) on Opus Books 1.3:1
Loading...
3.5
Cohen's d
Judge
-0.7848
0.3276
1.44
2.5524
Aug 7, 2025
Cohen's d
Updated 1mo ago
Evaluation Results
Method
Method
Links
Cohen's d
Judge
Reference availability...
2025.08
3.5
TVD-MI
2025.08
3.08
MI (DoE)
2025.08
2.66
Baseline
2025.08
1.22
GPPM
2025.08
0.73
Judge
Reference availability...
2025.08
-0.62
Feedback
Search any
task
Search any
task