Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
LLM Filtering on Manually adjudicated gold-standard CUIs Poor Mobility v1 (test)
Loading...
205
CUIs Count
GPT-5-mini
87.48
117.99
148.5
179.01
Feb 20, 2026
CUIs Count
Recall
Precision
F1-score
Updated 4d ago
Evaluation Results
Method
Method
Links
CUIs Count
Recall
Precision
F1-score
GPT-5-mini
Framework=CUICurate, B...
2026.02
205
86
87
86
GPT-5
Framework=CUICurate, B...
2026.02
171
76
93
83
manual
Mode=manual curation
2026.02
92
23
52
32
Feedback
Search any
task
Search any
task