Evaluating Factor-Wise Auxiliary Dynamics Supervision for Latent Structure and Robustness in Simulated Humanoid Locomotion

About

We evaluate whether factor-wise auxiliary dynamics supervision produces useful latent structure or improved robustness in simulated humanoid locomotion. DynaMITE -- a transformer encoder with a factored 24-d latent trained by per-factor auxiliary losses during proximal policy optimization (PPO) -- is compared against Long Short-Term Memory (LSTM), plain Transformer, and Multilayer Perceptron (MLP) baselines on a Unitree G1 humanoid across four Isaac Lab tasks. The supervised latent shows no evidence of decodable or functionally separable factor structure: probe R^2 ~ 0 for all five dynamics factors, clamping any subspace changes reward by < 0.05, and standard disentanglement metrics (MIG, DCI, SAP) are near zero. An unsupervised LSTM hidden state achieves higher probe R^2 (up to 0.10). A 2x2 factorial ablation (n = 10 seeds) isolates the contributions of the tanh bottleneck and auxiliary losses: the auxiliary losses show no measurable effect on either in-distribution (ID) reward (+0.03, p = 0.732) or severe out-of-distribution (OOD) reward (+0.03, p = 0.669), while the bottleneck shows a small, consistent advantage in both regimes (ID: +0.16, p = 0.207; OOD: +0.10, p = 0.208). The bottleneck advantage persists under severe combined perturbation but does not amplify, indicating a training-time representation benefit rather than a robustness mechanism. LSTM achieves the best nominal reward on all four tasks (p < 0.03); DynaMITE degrades less under combined-shift stress (2.3% vs. 16.7%), but this difference is attributable to the bottleneck compression, not the auxiliary supervision. For locomotion practitioners: auxiliary dynamics supervision does not produce an interpretable estimator and does not measurably improve reward or robustness beyond what the bottleneck alone provides; recurrent baselines remain the stronger choice for nominal performance.

Chayanin Chamachot• 2026

Related benchmarks

Task	Dataset	Result
Command tracking recovery after push	Flat terrain task	Recovery Steps5.6	28
Humanoid Locomotion	Humanoid Randomized Task (OOD Sweep)	Reward-4.37	24
Push recovery	Flat task Push-recovery protocol	Peak Tracking Error3.96	14
Velocity tracking	Combined-shift Level 3	Tracking Error4.23	4
Velocity tracking	Combined-shift Level 4	Velocity Tracking Error6.13	4
Humanoid Locomotion	Flat In-distribution (deterministic evaluation)	Cumulative Reward4.88	4
Humanoid Locomotion	Terrain In-distribution (deterministic evaluation)	Cumulative Reward4.49	4
Velocity tracking	Combined-shift Level 1	Tracking Error2.56	4
Humanoid Locomotion	Push In-distribution (deterministic evaluation)	Cumulative Reward4.6	4
Humanoid Locomotion	Randomized In-distribution (deterministic evaluation)	Cumulative Reward4.48	4

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord