Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance

About

We demonstrate the capabilities of an attention-based end-to-end approach for high-speed vision-based quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art learning architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional model-based approaches to navigation via independent perception, mapping, planning, and control modules breaks down due to increased sensor noise, compounding errors, and increased processing latency. Thus, learning-based, end-to-end vision-to-control networks have shown to have great potential for online control of these fast robots through cluttered environments. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer (ViT) models for depth image-to-control in high-fidelity simulation, observing that ViT models are more effective than others as quadrotor speeds increase and in generalization to unseen environments, while the addition of recurrence further improves performance while reducing quadrotor energy cost across all tested flight speeds. We assess performance at speeds of up to 7m/s in simulation and hardware. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.

Anish Bhattacharya, Nishanth Rao, Dhruv Parikh, Pratik Kunapuli, Yuwei Wu, Yuezhan Tao, Nikolai Matni, Vijay Kumar• 2024

Related benchmarks

TaskDatasetResultRank
Hovering MaintenanceUrban Street
SR0.00e+0
24
Hovering MaintenancePark
Success Rate0.00e+0
24
Dynamic Target FollowingForest 1.5 m/s target speed
Success Rate (SR)0.00e+0
6
Dynamic Target FollowingFactory 3.0 m/s target speed
Success Rate0.00e+0
6
Fixed-trajectory filmingPark scene 3.0 m/s obstacle speed
Success Rate (SR)0.00e+0
6
Fixed-trajectory filmingPark scene 6.0 m/s obstacle speed
Success Rate0.00e+0
6
Fixed-trajectory filmingForest scene 3.0 m/s obstacle speed
Success Rate0.00e+0
6
Fixed-trajectory filmingForest scene 6.0 m/s obstacle speed
Success Rate (SR)0.00e+0
6
Dynamic Target FollowingFactory 1.5 m/s target speed
Success Rate0.00e+0
6
Dynamic Target FollowingForest 3.0 m/s target speed
Success Rate0.00e+0
6
Showing 10 of 10 rows

Other info

Follow for update