ROMAN: Reward-Orchestrated Multi-Head Attention Network for Autonomous Driving System Testing

About

Automated Driving System (ADS) acts as the brain of autonomous vehicles, responsible for their safety and efficiency. Safe deployment requires thorough testing in diverse real-world scenarios and compliance with traffic laws like speed limits, signal obedience, and right-of-way rules. Violations like running red lights or speeding pose severe safety risks. However, current testing approaches face significant challenges: limited ability to generate complex and high-risk law-breaking scenarios, and failing to account for complex interactions involving multiple vehicles and critical situations. To address these challenges, we propose ROMAN, a novel scenario generation approach for ADS testing that combines a multi-head attention network with a traffic law weighting mechanism. ROMAN is designed to generate high-risk violation scenarios to enable more thorough and targeted ADS evaluation. The multi-head attention mechanism models interactions among vehicles, traffic signals, and other factors. The traffic law weighting mechanism implements a workflow that leverages an LLM-based risk weighting module to evaluate violations based on the two dimensions of severity and occurrence. We have evaluated ROMAN by testing the Baidu Apollo ADS within the CARLA simulation platform and conducting extensive experiments to measure its performance. Experimental results demonstrate that ROMAN surpassed state-of-the-art tools ABLE and LawBreaker by achieving 7.91% higher average violation count than ABLE and 55.96% higher than LawBreaker, while also maintaining greater scenario diversity. In addition, only ROMAN successfully generated violation scenarios for every clause of the input traffic laws, enabling it to identify more high-risk violations than existing approaches.

Jianlei Chi, Yuzhen Wu, Jiaxuan Hou, Xiaodong Zhang, Ming Fan, Suhui Sun, Weijun Dai, Bo Li, Jianguo Sun, Jun Sun• 2026

Related benchmarks

Task	Dataset	Result
Scenario Generation	Scenario S2	Inference Time (s)47	3
Scenario Generation	Scenario S3	Inference Time (s)52	3
Scenario Generation	Scenario S1	Inference Time (s)44	3
Scenario Generation	Scenario S4	Inference Time (s)53	3
Violation Scenario Generation	Scenario S1	Mean Score9.86	3
Violation Scenario Generation	Scenario S2	Mean Violation8.07	3
Violation Scenario Generation	Scenario S3	Mean Score6.41	3
Violation Scenario Generation	Scenario S4	Mean Violation7.15	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord