E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving
About
End-to-end autonomous driving (AD) systems increasingly adopt vision-language-action (VLA) models, yet they typically ignore the passenger's emotional state, which is central to comfort and AD acceptance. We introduce Open-Domain End-to-End (OD-E2E) autonomous driving, where an autonomous vehicle (AV) must interpret free-form natural-language commands, infer the emotion, and plan a physically feasible trajectory. We propose E3AD, an emotion-aware VLA framework that augments semantic understanding with two cognitively inspired components: a continuous Valenc-Arousal-Dominance (VAD) emotion model that captures tone and urgency from language, and a dual-pathway spatial reasoning module that fuses egocentric and allocentric views for human-like spatial cognition. A consistency-oriented training scheme, combining modality pretraining with preference-based alignment, further enforces coherence between emotional intent and driving actions. Across real-world datasets, E3AD improves visual grounding and waypoint planning and achieves state-of-the-art (SOTA) VAD correlation for emotion estimation. These results show that injecting emotion into VLA-style driving yields more human-aligned grounding, planning, and human-centric feedback.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Grounding | Talk2Car | IoU80.12 | 15 | |
| Visual Grounding | MoCAD (test) | IoU0.8094 | 15 | |
| Visual Grounding | MoCAD (val) | IoU79.64 | 15 | |
| Visual Grounding | DrivePilot (test) | IoU81.02 | 15 | |
| Visual Grounding | DrivePilot (val) | IoU82.56 | 15 | |
| Visual Grounding | Corner-case Visual Constr. | IoU76.62 | 15 | |
| Visual Grounding | Corner-case Multi-agent | IoU77.24 | 15 | |
| Visual Grounding | Corner-case Ambiguous | IoU77.05 | 15 | |
| Visual Grounding | Long-text (val) | IoU77.86 | 15 | |
| Trajectory Planning | Unified Evaluation Settings Autonomous Driving (test) | ADE3.88 | 14 |