Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Action Recommendation on RealICU GOLD
Loading...
67.6
Hit Rate@5
ICU-Evo
24.232
35.491
46.75
58.009
May 13, 2026
Hit Rate@5
Recall@5
Updated 20d ago
Evaluation Results
Method
Method
Links
Hit Rate@5
Recall@5
ICU-Evo
Backbone=Gemini-3.1-pro
2026.05
67.6
53.4
ICU-Evo
Backbone=GPT-5.4
2026.05
67.6
53.4
ICU-Evo
Backbone=Qwen3-235B
2026.05
52.6
35.7
RAG
Backbone=Gemini-3.1-pro
2026.05
49.6
31.3
RAG
Backbone=GPT-5.4
2026.05
48
39.8
RAG
Backbone=Qwen3-235B
2026.05
45.3
32.4
Full-context
Backbone=GPT-5.4
2026.05
40.4
30
Local-window
Backbone=Gemini-3.1-pro
2026.05
39.5
26
Local-window
Backbone=GPT-5.4
2026.05
38
28.1
Local-window
Backbone=Qwen3-235B
2026.05
35.2
24.2
Full-context
Backbone=Qwen3-235B
2026.05
32.9
22.2
Full-context
Backbone=Gemini-3.1-pro
2026.05
25.9
15.2
Feedback
Search any
task
Search any
task