Share your thoughts, 1 month free Claude Pro on usSee more

Translator-call mode selection on PolyMath Medium

75.22Macro F1

LUAR

Updated 1mo ago

Evaluation Results

Method	Links
LUAR 2026.06		75.22
ST(qr) 2026.06		69.19
BOUNDARY-SFT 2026.06		65.9
LUAR 2026.06		65.61
ST(qr) 2026.06		64.59
BOUNDARY-SFT 2026.06		62.22
SELF-ASSESSMENT 2026.06		56.44
ST(q) 2026.06		56.18
SELF-ASSESSMENT 2026.06		54.28
ST(q) 2026.06		52.59
NATIVE-TOOL-USE 2026.06		49.61
NATIVE-TOOL-USE 2026.06		39.23