Share your thoughts, 1 month free Claude Pro on usSee more

Document-level Information Extraction on MultiMUC (averaged across languages)

33.08F1 Score

THINKTWICE Qwen 3 (oracle)

Updated 4mo ago

Evaluation Results

Method	Links
THINKTWICE Qwen 3 (oracle) 2026.01		33.08
THINKTWICE Llama R1 (oracle) 2026.01		29.66
THINKTWICE Qwen 3 2026.01		15.04
THINKTWICE Qwen 3 2026.01		14.83
THINKTWICE Llama R1 2026.01		13.22
Greedy Qwen 3 2026.01		12.98
ChatGPT 3.5 2026.01		12.93
THINKTWICE Llama R1 2026.01		12.78
Greedy Llama R1 2026.01		11.46