Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
LLM Filtering on Manually adjudicated gold-standard CUIs Fluid Overload v1 (test)
Loading...
77
CUIs Count
GPT-5-mini
28.12
40.81
53.5
66.19
Feb 20, 2026
CUIs Count
Recall
Precision
F1-score
Updated 4d ago
Evaluation Results
Method
Method
Links
CUIs Count
Recall
Precision
F1-score
GPT-5-mini
Framework=CUICurate, B...
2026.02
77
50
92
65
GPT-5
Framework=CUICurate, B...
2026.02
50
34
96
50
manual
Mode=manual curation
2026.02
30
18
87
30
Feedback
Search any
task
Search any
task