Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Red Flags on RealICU-GOLD
Loading...
47.3
HRR@5
ICU-Evo
6.428
17.039
27.65
38.261
May 13, 2026
HRR@5
Updated 20d ago
Evaluation Results
Method
Method
Links
HRR@5
ICU-Evo
Backbone=GPT-5.4
2026.05
47.3
ICU-Evo
Backbone=Gemini-3.1-pro
2026.05
30
Full-context
Backbone=GPT-5.4
2026.05
29.8
RAG
Backbone=GPT-5.4
2026.05
23.4
RAG
Backbone=Gemini-3.1-pro
2026.05
21.6
Local-window
Backbone=GPT-5.4
2026.05
16.5
Local-window
Backbone=Gemini-3.1-pro
2026.05
15.1
Full-context
Backbone=Gemini-3.1-pro
2026.05
13.7
Full-context
Backbone=Qwen3-235B
2026.05
11.7
ICU-Evo
Backbone=Qwen3-235B
2026.05
11.7
RAG
Backbone=Qwen3-235B
2026.05
9.5
Local-window
Backbone=Qwen3-235B
2026.05
8
Feedback
Search any
task
Search any
task