Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Diagnostic Reasoning on DiReCT (test)
Loading...
66.5
Reasoning Recall
MultiDx
46.532
51.716
56.9
62.084
Apr 27, 2026
Reasoning Recall
H@1 Accuracy
H@5 Accuracy
H@10 Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Reasoning Recall
H@1 Accuracy
H@5 Accuracy
H@10 Accuracy
MultiDx
Model Category=Agentic...
2026.04
66.5
33.3
50.3
58.7
Self-refinement
Model Category=Agentic...
2026.04
66.2
30
46.6
58.6
OpenAI-DR
Model Category=Agentic...
2026.04
58.6
29.7
45.2
47.9
DeepSeek-R1
Model Category=Base Model
2026.04
47.3
29.3
41.3
47.3
Feedback
Search any
task
Search any
task