Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Fact-checking on XSum
Loading...
56.4
Balanced Accuracy
ANCHOR
46.936
49.393
51.85
54.307
May 11, 2026
Balanced Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Balanced Accuracy
ANCHOR
Backbone=Qwen2.5-72B,...
2026.05
56.4
Vanilla
Backbone=Qwen2.5-72B,...
2026.05
55.7
Vanilla
Backbone=DeepSeek-V3-6...
2026.05
53.7
CoT
Backbone=Qwen2.5-72B,...
2026.05
53.1
CoT
Backbone=DeepSeek-V3-6...
2026.05
52.7
CoT
Backbone=DeepSeek-V3-6...
2026.05
51.7
BIRD
Backbone=Qwen2.5-72B,...
2026.05
51.1
CoT
Backbone=Qwen2.5-72B,...
2026.05
48.4
Vanilla
Backbone=DeepSeek-V3-6...
2026.05
48.3
Vanilla
Backbone=Qwen2.5-72B,...
2026.05
47.3
Feedback
Search any
task
Search any
task