PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models
About
Recent advances in duplex speech models have enabled natural, low-latency speech-to-speech interactions. However, existing models are restricted to a fixed role and voice, limiting their ability to support structured, role-driven real-world applications and personalized interactions. In this work, we introduce PersonaPlex, a duplex conversational speech model that incorporates hybrid system prompts, combining role conditioning with text prompts and voice cloning with speech samples. PersonaPlex is trained on a large-scale synthetic dataset of paired prompts and user-agent conversations, generated with open-source large language models (LLM) and text-to-speech (TTS) models. To evaluate role conditioning in real-world settings, we extend the Full-Duplex-Bench benchmark beyond a single assistant role to multi-role customer service scenarios. Experiments show that PersonaPlex achieves strong role-conditioned behavior, voice-conditioned speech, and natural conversational responsiveness, surpassing state-of-the-art duplex speech models and hybrid large language model-based speech systems in role adherence, speaker similarity, latency, and naturalness.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Interruption Handling | Full-Duplex-Bench | GPT-4o Score4.21 | 18 | |
| Turn Taking | Full-Duplex-Bench | TOR99.2 | 17 | |
| Pause Handling | Full-Duplex-Bench Candor | TOR0.662 | 13 | |
| User Interruption | Bilingual Full-Duplex-Bench English | RL0.4 | 12 | |
| Pause Handling | Full-Duplex-Bench Synthetic | TOR58.4 | 11 | |
| Backchanneling | Full-Duplex-Bench | TOR32.7 | 11 | |
| Overall Evaluation | Bilingual Full-Duplex-Bench English | Accuracy79 | 8 | |
| Duplex Dialogue Turn-Taking | Full-Duplex-Bench | Synthetic TOR for Pause Handling0.358 | 8 | |
| Turn Taking | Bilingual Full-Duplex-Bench English | TOR99.2 | 6 | |
| Pause Handling | Bilingual Full-Duplex-Bench English | TOR62.3 | 6 |