Gaze Prediction as Time-Series Forecasting for Virtual Reality Applications: Quantifying Performance Variability and Extreme-Case Errors
About
Gaze prediction is essential for addressing motion-to-photon latency and ensuring seamless foveated rendering in Virtual Reality. The reliability of gaze forecasting is highly sensitive to individual differences and the eye movements being predicted. We evaluate recurrent, transformer-based, and classification-guided architectures to assess their generalization capabilities across oculomotor events. Using the GazeBase VR and Meta Quest Pro datasets, we analyzed the relationship between the median (P50) and high-percentile (P95) error profiles across subjects. The analysis reveals significant performance variability, showing that subjects with low P50 errors do not always exhibit the lowest extreme-case errors. Consequently, low median errors do not guarantee the robustness of the utilized solution. We discuss inference performance and address the class imbalance problem in short-term gaze prediction. These results identify a gap in standardized evaluation methods, necessitating a shift toward P95-focused, subject-specific metrics to develop reliable and perceptually stable gaze-contingent systems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Gaze Prediction | Quest Pro | Fixation P500.37 | 6 | |
| Gaze Prediction | GazeBase VR | Fixation Error P500.13 | 6 |