Share your thoughts, 1 month free Claude Pro on usSee more

Translator-call mode selection on PolyMath High

73.74Macro F1

LUAR

Updated 1mo ago

Evaluation Results

Method	Links
LUAR 2026.06		73.74
LUAR 2026.06		69.74
BOUNDARY-SFT 2026.06		66.85
ST(qr) 2026.06		60.73
BOUNDARY-SFT 2026.06		60.54
ST(q) 2026.06		60.41
ST(qr) 2026.06		58.24
ST(q) 2026.06		53.58
SELF-ASSESSMENT 2026.06		52.93
SELF-ASSESSMENT 2026.06		48.6
NATIVE-TOOL-USE 2026.06		48.54
NATIVE-TOOL-USE 2026.06		45.6