Share your thoughts, 1 month free Claude Pro on usSee more

Translator-call Mode Selection on PolyMath Low

87.64Macro F1 Score

LUAR

Updated 1mo ago

Evaluation Results

Method	Links
LUAR 2026.06		87.64
ST(qr) 2026.06		82.92
ST(qr) 2026.06		79.26
LUAR 2026.06		78.32
BOUNDARY-SFT 2026.06		73.14
BOUNDARY-SFT 2026.06		69.2
ST(q) 2026.06		64.01
ST(q) 2026.06		63.25
NATIVE-TOOL-USE 2026.06		58.41
NATIVE-TOOL-USE 2026.06		57.6
SELF-ASSESSMENT 2026.06		43.48
SELF-ASSESSMENT 2026.06		41.1