Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
LLM Filtering on Manually adjudicated gold-standard CUIs Chronic Heart Failure v1 (test)
Loading...
137
CUIs Count
GPT-5-mini
96.44
106.97
117.5
128.03
Feb 20, 2026
CUIs Count
Recall
Precision
F1-score
Updated 4d ago
Evaluation Results
Method
Method
Links
CUIs Count
Recall
Precision
F1-score
GPT-5-mini
Framework=CUICurate, B...
2026.02
137
92
87
89
GPT-5
Framework=CUICurate, B...
2026.02
119
86
93
90
manual
Mode=manual curation
2026.02
98
74
97
84
Feedback
Search any
task
Search any
task