Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Evaluating Factor-Wise Auxiliary Dynamics Supervision for Latent Structure and Robustness in Simulated Humanoid Locomotion

About

We evaluate whether factor-wise auxiliary dynamics supervision produces useful latent structure or improved robustness in simulated humanoid locomotion. DynaMITE -- a transformer encoder with a factored 24-d latent trained by per-factor auxiliary losses during proximal policy optimization (PPO) -- is compared against Long Short-Term Memory (LSTM), plain Transformer, and Multilayer Perceptron (MLP) baselines on a Unitree G1 humanoid across four Isaac Lab tasks. The supervised latent shows no evidence of decodable or functionally separable factor structure: probe R^2 ~ 0 for all five dynamics factors, clamping any subspace changes reward by < 0.05, and standard disentanglement metrics (MIG, DCI, SAP) are near zero. An unsupervised LSTM hidden state achieves higher probe R^2 (up to 0.10). A 2x2 factorial ablation (n = 10 seeds) isolates the contributions of the tanh bottleneck and auxiliary losses: the auxiliary losses show no measurable effect on either in-distribution (ID) reward (+0.03, p = 0.732) or severe out-of-distribution (OOD) reward (+0.03, p = 0.669), while the bottleneck shows a small, consistent advantage in both regimes (ID: +0.16, p = 0.207; OOD: +0.10, p = 0.208). The bottleneck advantage persists under severe combined perturbation but does not amplify, indicating a training-time representation benefit rather than a robustness mechanism. LSTM achieves the best nominal reward on all four tasks (p < 0.03); DynaMITE degrades less under combined-shift stress (2.3% vs. 16.7%), but this difference is attributable to the bottleneck compression, not the auxiliary supervision. For locomotion practitioners: auxiliary dynamics supervision does not produce an interpretable estimator and does not measurably improve reward or robustness beyond what the bottleneck alone provides; recurrent baselines remain the stronger choice for nominal performance.

Chayanin Chamachot• 2026

Related benchmarks

TaskDatasetResultRank
Command tracking recovery after pushFlat terrain task
Recovery Steps5.6
28
Humanoid LocomotionHumanoid Randomized Task (OOD Sweep)
Reward-4.37
24
Push recoveryFlat task Push-recovery protocol
Peak Tracking Error3.96
14
Velocity trackingCombined-shift Level 3
Tracking Error4.23
4
Velocity trackingCombined-shift Level 4
Velocity Tracking Error6.13
4
Humanoid LocomotionFlat In-distribution (deterministic evaluation)
Cumulative Reward4.88
4
Humanoid LocomotionTerrain In-distribution (deterministic evaluation)
Cumulative Reward4.49
4
Velocity trackingCombined-shift Level 1
Tracking Error2.56
4
Humanoid LocomotionPush In-distribution (deterministic evaluation)
Cumulative Reward4.6
4
Humanoid LocomotionRandomized In-distribution (deterministic evaluation)
Cumulative Reward4.48
4
Showing 10 of 19 rows

Other info

Follow for update