Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Decision Making on Medical Decision-Making Simulation Dataset
Loading...
92.3
Accuracy
ARSM
75.972
80.211
84.45
88.689
May 7, 2026
Accuracy
F1 Score
Attack Success Rate
Safe Rejection Rate
Hallucination Rate
Knowledge Consistency Score
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
F1 Score
Attack Success Rate
Safe Rejection Rate
Hallucination Rate
Knowledge Consistency Score
ARSM
Avg Decision Latency (...
2026.05
92.3
91.8
8.7
12.3
2.1
0.91
Adv-Train
Avg Decision Latency (...
2026.05
86
86
18.9
9.4
3.9
0.85
Retrieval
Avg Decision Latency (...
2026.05
85.1
85
27.5
8.1
5.3
0.8
Filter
Avg Decision Latency (...
2026.05
83.8
83.9
25.3
8.7
4.8
0.81
LLM
Avg Decision Latency (...
2026.05
76.6
76.9
42.1
6.2
8.5
0.72
Feedback
Search any
task
Search any
task