Safe and Optimal Learning from Preferences via Weighted Temporal Logic with Applications in Robotics and Formula 1

About

Autonomous systems increasingly rely on human feedback to align their behavior, expressed as pairwise comparisons, rankings, or demonstrations. While existing methods can adapt behaviors, they often fail to guarantee safety in safety-critical domains. We propose a safety-guaranteed, optimal, and efficient approach for solving the learning problem from preferences, rankings, or demonstrations using Weighted Signal Temporal Logic (WSTL). WSTL learning problems, when implemented naively, lead to multi-linear constraints in the weights to be learned. By introducing structural pruning and log-transform procedures, we reduce the problem size and recast it as a Mixed-Integer Linear Program while preserving safety guarantees. Experiments on robotic navigation and real-world Formula 1 data demonstrate that the method captures nuanced preferences and models complex task objectives.

Ruya Karagulle, Cristian-Ioan Vasile, Necmiye Ozay• 2025

Related benchmarks

Task	Dataset	Result
Learning to Rank	Monza Grand Prix excluding DNF/DNS 2021-2024 (train)	Accuracy93.9	4
Learning to Rank	Monza Grand Prix including DNF/DNS 2021-2024 (train)	Accuracy90.2	4
Learning to Rank	Monza Grand Prix including DNF/DNS 2025 (test)	Accuracy85.3	4
Learning to Rank	Monza Grand Prix excluding DNF DNS 2025 (test)	Accuracy81.7	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord