Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Acute Problems on RealICU GOLD
Loading...
86.7
Hit@5
ICU-Evo
18.684
36.342
54
71.658
May 13, 2026
Hit@5
R@5
Updated 20d ago
Evaluation Results
Method
Method
Links
Hit@5
R@5
ICU-Evo
Backbone=GPT-5.4
2026.05
86.7
57
ICU-Evo
Backbone=Gemini-3.1-pro
2026.05
82.3
52.6
ICU-Evo
Backbone=Qwen3-235B
2026.05
60
36.2
RAG
Backbone=GPT-5.4
2026.05
59.9
34.9
RAG
Backbone=Gemini-3.1-pro
2026.05
59.6
34.2
Full-context
Backbone=GPT-5.4
2026.05
51
34.8
Local-window
Backbone=GPT-5.4
2026.05
50
29.3
Full-context
Backbone=Gemini-3.1-pro
2026.05
48.6
30.8
Local-window
Backbone=Gemini-3.1-pro
2026.05
45.9
25.8
Full-context
Backbone=Qwen3-235B
2026.05
38.4
22.6
RAG
Backbone=Qwen3-235B
2026.05
37.9
21.1
Local-window
Backbone=Qwen3-235B
2026.05
21.3
12.6
Feedback
Search any
task
Search any
task