Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Diagnostic Reasoning Generalization on CheXbench (test)
Loading...
68.7
Visual QA: Rad-Restruct (Top-1 Acc)
MedRAX
33.548
42.674
51.8
60.926
Apr 3, 2026
Visual QA: Rad-Restruct (Top-1 Acc)
Visual QA: SLAKE (Top-1 Acc)
Reasoning: OpenI (Top-1 Acc)
Overall Accuracy
Updated 13d ago
Evaluation Results
Method
Method
Links
Visual QA: Rad-Restruct (Top-1 Acc)
Visual QA: SLAKE (Top-1 Acc)
Reasoning: OpenI (Top-1 Acc)
Overall Accuracy
MedRAX
Model Category=Multi-A...
2026.04
68.7
82.9
52.6
68.1
XrayClaw
Model Category=Multi-A...
2026.04
66.3
85.6
62.1
70.7
CheXagent
Model Category=Multi-A...
2026.04
57.1
78.1
59
64.7
GPT-4o
Model Category=MLLMs-B...
2026.04
53.9
85.4
51.1
63.5
LLaVA-Med
Model Category=MLLMs-B...
2026.04
34.9
55.5
45.8
45.4
Feedback
Search any
task
Search any
task