TALON: Token-Aligned Lightweight Adapters for 6-DoF Spacecraft Pose Estimation

About

Monocular 6-DoF spacecraft pose estimation methods predominantly process individual frames, discarding the temporal information present in an image sequence acquired during spacecraft manoeuvres. Few temporal approaches require full backbone fine-tuning or auxiliary optical flow networks, risking catastrophic forgetting or increasing computational cost, respectively. We propose TALON (Token-Aligned Lightweight adapters for Orbital Navigation): spatiotemporal 3D adapters injected before the self-attention layers of a frozen ViT vision transformer, combined with a patch-token alignment loss that geometrically grounds the adapted features to keypoint structure through a prototype-conditioned KL-divergence objective. Pre-attention placement allows the frozen attention to reason over temporally enriched tokens, achieving stronger performance with a single adapter per block than post-attention alternatives. The alignment loss shapes the intermediate representations so that each keypoint induces a spatially precise activation in the token field, while the framework adds less than 5% parameters to the frozen backbone. On SPADES dataset, TALON reduces the pose error by 50% over the prior state-of-the-art, and on SwissCube dataset it surpasses the prior best by 21.8% in ADD-0.1d accuracy. Zero-shot cross-domain evaluation from sim-to-real on SPARK real data reduces pose error by 4.7x, and ablations characterise the role of adapter depth across in-domain and cross-domain settings.

Abid Ali, Arunkumar Rathinam, Djamila Aouada• 2026

Related benchmarks

Task	Dataset	Result
6D Pose Estimation	SwissCube (test)	Acc (ADI-0.1d) Near83.1	10
6D Pose Estimation	SPARK real sequences	Translation Error (E_T^#)0.0223	7
6D Pose Estimation	SPARK synthetic cross-domain (test)	Translational Error ($E_T^#$)0.0062	5
6-DoF Pose Estimation	SPADES	Error Threshold (ET)0.0123	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord