Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation

About

We present Sequential Policy Optimization for Simultaneous Machine Translation (SeqPO-SiMT), a new policy optimization framework that defines the simultaneous machine translation (SiMT) task as a sequential decision making problem, incorporating a tailored reward to enhance translation quality while reducing latency. In contrast to popular Reinforcement Learning from Human Feedback (RLHF) methods, such as PPO and DPO, which are typically applied in single-step tasks, SeqPO-SiMT effectively tackles the multi-step SiMT task. This intuitive framework allows the SiMT LLMs to simulate and refine the SiMT process using a tailored reward. We conduct experiments on six datasets from diverse domains for En to Zh and Zh to En SiMT tasks, demonstrating that SeqPO-SiMT consistently achieves significantly higher translation quality with lower latency. In particular, SeqPO-SiMT outperforms the supervised fine-tuning (SFT) model by 1.13 points in COMET, while reducing the Average Lagging by 6.17 in the NEWSTEST2021 En to Zh dataset. While SiMT operates with far less context than offline translation, the SiMT results of SeqPO-SiMT on 7B LLM surprisingly rival the offline translation of high-performing LLMs, including Qwen-2.5-7B-Instruct and LLaMA-3-8B-Instruct.

Ting Xu, Zhichao Huang, Jiankai Sun, Shanbo Cheng, Wai Lam• 2025

Related benchmarks

TaskDatasetResultRank
Machine TranslationMuST-C En-Zh (test)
BLEURT67.59
9
Machine TranslationREALSI Zh-En (test)
BLEURT Score66.82
9
Simultaneous Machine TranslationNIST 2003-2006 (test)
COMET Score83.37
8
Simultaneous Machine TranslationREALSI En -> Zh
BLEURT Score66.41
6
Simultaneous Machine TranslationMUSTC En -> Zh
BLEURT Score0.6746
6
Simultaneous Machine TranslationNEWSTEST En -> Zh 2021
BLEURT64.62
6
Simultaneous Machine TranslationREALSI Zh → En Low Latency
BLEURT65.93
3
Simultaneous Machine TranslationREALSI Zh → En High Latency
BLEURT66.24
3
Simultaneous Machine TranslationCOVOST Zh → En Low Latency
BLEURT63.01
3
Simultaneous Machine TranslationCOVOST Zh → En High Latency
BLEURT63.28
3
Showing 10 of 12 rows

Other info

Follow for update