Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Fact Verification on SciTab
Loading...
71
Macro F1
MACE
27.32
38.66
50
61.34
Apr 19, 2026
Macro F1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Macro F1
MACE
Backbone=Qw-235B, Comp...
2026.04
71
MACE
Backbone=Qw-72B, Compo...
2026.04
67
GPT-4
2026.04
65
MACE
Backbone=Ll-8B, Compon...
2026.04
65
TART
Backbone=GPT-4
2026.04
64
GPT-4 + COT
prompting=Chain-of-Tho...
2026.04
63
MACE
Backbone=Mt-7B, Compon...
2026.04
58
ProTrix
2026.04
43
InstructGPT + COT
prompting=Chain-of-Tho...
2026.04
43
InstructGPT
2026.04
42
Vicuna (13B)
2026.04
35
PASTA
2026.04
33
Ll-13B
2026.04
33
Alpaca-7B
2026.04
29
Feedback
Search any
task
Search any
task