Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Translator-call mode selection on MMLU ProX Lite
Loading...
76.45
Macro F1
LUAR
39.9564
49.4307
58.905
68.3793
Jun 1, 2026
Macro F1
Updated 1d ago
Evaluation Results
Method
Method
Links
Macro F1
LUAR
Backbone=Qwen3-4B
2026.06
76.45
LUAR
Backbone=Qwen3-8B
2026.06
68.18
ST(qr)
Backbone=Qwen3-4B
2026.06
65.44
BOUNDARY-SFT
Backbone=Qwen3-4B
2026.06
65.21
ST(qr)
Backbone=Qwen3-8B
2026.06
61.13
NATIVE-TOOL-USE
Backbone=Qwen3-4B
2026.06
60.03
BOUNDARY-SFT
Backbone=Qwen3-8B
2026.06
59.47
ST(q)
Backbone=Qwen3-4B
2026.06
48.38
SELF-ASSESSMENT
Backbone=Qwen3-8B
2026.06
48.04
SELF-ASSESSMENT
Backbone=Qwen3-4B
2026.06
47.83
ST(q)
Backbone=Qwen3-8B
2026.06
46.8
NATIVE-TOOL-USE
Backbone=Qwen3-8B
2026.06
41.36
Feedback
Search any
task
Search any
task