Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Constraint Code Generation on Tumor Diagnosis Human Evaluation Samples
Loading...
5
Correctness
SciDC
4.584
4.692
4.8
4.908
Apr 8, 2026
Correctness
Integrity
Efficiency
Updated 9d ago
Evaluation Results
Method
Method
Links
Correctness
Integrity
Efficiency
SciDC
2026.04
5
4.8
3.2
GPT-5
2026.04
4.6
4
3
Claude
2026.04
4.6
4.6
2.6
Feedback
Search any
task
Search any
task