Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Complex reasoning on VitaminC (val)
Loading...
69.99
Macro F1 Score
EvoPool
3.4716
20.7408
38.01
55.2792
Jun 1, 2026
Macro F1 Score
Updated 1d ago
Evaluation Results
Method
Method
Links
Macro F1 Score
EvoPool
Backbone=gpt-4o-mini
2026.06
69.99
LLM annotation
Backbone=gpt-4o-mini
2026.06
59.93
Alchemist
Backbone=gpt-4o-mini
2026.06
16.92
DataSculpt
Backbone=gpt-4o-mini
2026.06
6.03
Feedback
Search any
task
Search any
task