Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tumor diagnosis
Loading...
100
Validity
Claude-3.5
73.272
80.211
87.15
94.089
Apr 8, 2026
Validity
Exact Match
Updated 9d ago
Evaluation Results
Method
Method
Links
Validity
Exact Match
Claude-3.5
Backbone model=Claude-3.5
2026.04
100
88.5
Qwen3-4B w SciDC
Backbone model=Qwen3-4...
2026.04
100
66.7
Qwen3-14B w SciDC
Backbone model=Qwen3-1...
2026.04
100
79.5
GPT-5
Backbone model=GPT-5
2026.04
98.5
94.5
Qwen3-4B
Backbone model=Qwen3-4B
2026.04
97.3
59.2
Qwen3-4B w/o K
Backbone model=Qwen3-4...
2026.04
94.3
4
Qwen3-14B
Backbone model=Qwen3-14B
2026.04
79.7
72
Qwen3-14B w/o K
Backbone model=Qwen3-1...
2026.04
74.3
36.5
Feedback
Search any
task
Search any
task