Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Complex Reasoning on SciFact (val)
Loading...
71.15
Macro-F1
EvoPool
24.2876
36.4538
48.62
60.7862
Jun 1, 2026
Macro-F1
Updated 1d ago
Evaluation Results
Method
Method
Links
Macro-F1
EvoPool
Backbone=gpt-4o-mini
2026.06
71.15
LLM annotation
Backbone=gpt-4o-mini
2026.06
70.4
Alchemist
Backbone=gpt-4o-mini
2026.06
34.38
DataSculpt
Backbone=gpt-4o-mini
2026.06
26.09
Feedback
Search any
task
Search any
task