Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on Blind-Judge Quality Benchmark clinical policy governance Claude Sonnet 4 (test)

4.5Overall Score

CoT

4.1884.2694.354.431Mar 25, 2026
Updated 23d ago

Evaluation Results

MethodLinks
2026.03
4.5-
2026.03
4.33-
2026.03
4.3-
2026.03
4.2-
2026.03
4.2-
2026.03
4.2-
2026.03
4.2-
2026.03
4.2-