Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Action Recommendation on RealICU-Scale (test)
Loading...
57.5
Hit@5
ICU-Evo
43.46
47.105
50.75
54.395
May 13, 2026
Hit@5
R@5
Updated 20d ago
Evaluation Results
Method
Method
Links
Hit@5
R@5
ICU-Evo
Backbone=GPT-5.4 [22],...
2026.05
57.5
36.8
ICU-Evo
Backbone=Qwen3-235B [3...
2026.05
51.5
32.7
ICU-Evo
Backbone=Gemini-3.1-pr...
2026.05
51.4
33
RAG
Backbone=GPT-5.4 [22],...
2026.05
50.9
43.5
RAG
Backbone=Gemini-3.1-pr...
2026.05
46.6
33.1
Full-context
Backbone=Qwen3-235B [3...
2026.05
45.5
29.9
Local-window
Backbone=GPT-5.4 [22],...
2026.05
45.1
30.8
Local-window
Backbone=Gemini-3.1-pr...
2026.05
44.7
30.7
RAG
Backbone=Qwen3-235B [3...
2026.05
44.6
34.2
Local-window
Backbone=Qwen3-235B [3...
2026.05
44
29.5
Feedback
Search any
task
Search any
task