Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Patient Status on RealICU GOLD
Loading...
45.9
Accuracy
ICU-Evo
13.972
22.261
30.55
38.839
May 13, 2026
Accuracy
F1 Score
Updated 20d ago
Evaluation Results
Method
Method
Links
Accuracy
F1 Score
ICU-Evo
Backbone=Gemini-3.1-pro
2026.05
45.9
36.5
RAG
Backbone=Gemini-3.1-pro
2026.05
40.2
34.8
Local-window
Backbone=Gemini-3.1-pro
2026.05
31.5
23.9
RAG
Backbone=Qwen3-235B
2026.05
31.5
27.1
ICU-Evo
Backbone=GPT-5.4
2026.05
31.2
26.4
Full-context
Backbone=Gemini-3.1-pro
2026.05
29.8
25.8
Full-context
Backbone=GPT-5.4
2026.05
29.4
23.3
RAG
Backbone=GPT-5.4
2026.05
28.8
25.6
ICU-Evo
Backbone=Qwen3-235B
2026.05
25.3
19.7
Local-window
Backbone=GPT-5.4
2026.05
23.3
18.4
Full-context
Backbone=Qwen3-235B
2026.05
22.5
18.8
Local-window
Backbone=Qwen3-235B
2026.05
15.2
15.4
Feedback
Search any
task
Search any
task