Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on Blind-Judge Quality Benchmark clinical policy governance Claude Sonnet 4 (test)
Loading...
4.5
Overall Score
CoT
4.188
4.269
4.35
4.431
Mar 25, 2026
Overall Score
Win Rate
Updated 23d ago
Evaluation Results
Method
Method
Links
Overall Score
Win Rate
CoT
Run=Run 2
2026.03
4.5
-
CoT
Run=Mean (SD)
2026.03
4.33
-
CoT
Run=Run 3
2026.03
4.3
-
EMoT
Run=Run 1
2026.03
4.2
-
EMoT
Run=Run 2
2026.03
4.2
-
EMoT
Run=Run 3
2026.03
4.2
-
EMoT
Run=Mean (SD)
2026.03
4.2
-
CoT
Run=Run 1
2026.03
4.2
-
Feedback
Search any
task
Search any
task