Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Document-level Information Extraction on MultiMUC (averaged across languages)
Loading...
33.08
F1 Score
THINKTWICE Qwen 3 (oracle)
10.5952
16.4326
22.27
28.1074
Jan 26, 2026
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
THINKTWICE Qwen 3 (oracle)
Selector=oracle, Backb...
2026.01
33.08
THINKTWICE Llama R1 (oracle)
Selector=oracle, Backb...
2026.01
29.66
THINKTWICE Qwen 3
Selector=F1 Voting, Ba...
2026.01
15.04
THINKTWICE Qwen 3
Selector=Majority, Bac...
2026.01
14.83
THINKTWICE Llama R1
Selector=F1 Voting, Ba...
2026.01
13.22
Greedy Qwen 3
Selector=X, Backbone=Q...
2026.01
12.98
ChatGPT 3.5
Selector=X, Backbone=G...
2026.01
12.93
THINKTWICE Llama R1
Selector=Majority, Bac...
2026.01
12.78
Greedy Llama R1
Selector=X, Backbone=L...
2026.01
11.46
Feedback
Search any
task
Search any
task