Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Red Flags Detection on RealICU-Scale (test)
Loading...
29.2
HRR
ICU-Evo
5.696
11.798
17.9
24.002
May 13, 2026
HRR
Updated 20d ago
Evaluation Results
Method
Method
Links
HRR
ICU-Evo
Backbone=Qwen3-235B [3...
2026.05
29.2
RAG
Backbone=Qwen3-235B [3...
2026.05
22.5
Full-context
Backbone=Qwen3-235B [3...
2026.05
21.5
Local-window
Backbone=Qwen3-235B [3...
2026.05
20.7
RAG
Backbone=GPT-5.4 [22],...
2026.05
9.6
ICU-Evo
Backbone=GPT-5.4 [22],...
2026.05
9
ICU-Evo
Backbone=Gemini-3.1-pr...
2026.05
8.7
RAG
Backbone=Gemini-3.1-pr...
2026.05
7.3
Local-window
Backbone=GPT-5.4 [22],...
2026.05
7.3
Local-window
Backbone=Gemini-3.1-pr...
2026.05
6.6
Feedback
Search any
task
Search any
task