GeoFormer: A Swin Transformer-Based Framework for Scene-Level Building Height and Footprint Estimation from Sentinel Imagery
About
Accurate three-dimensional urban data are critical for climate modelling, disaster risk assessment, and urban planning, yet remain scarce due to reliance on proprietary sensors or poor cross-city generalisation. We propose GeoFormer, an open-source Swin Transformer framework that jointly estimates building height (BH) and footprint (BF) on a 100 m grid using only Sentinel-1/2 imagery and open DEM data. A geo-blocked splitting strategy ensures strict spatial independence between training and test sets. Evaluated over 54 diverse cities, GeoFormer achieves a BH RMSE of 3.19 m and a BF RMSE of 0.05, improving 7.5% and 15.3% over the strongest CNN baseline, while maintaining under 3.5 m BH RMSE in cross-continent transfer. Ablation studies confirm that DEM is indispensable for height estimation and that optical reflectance dominates over SAR, though multi-source fusion yields the best overall accuracy. All code, weights, and global products are publicly released.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Building Height Estimation | Sentinel | -- | 6 | |
| Building Height Estimation | Sentinel + LiDAR | -- | 2 | |
| Building Height Estimation | Sentinel + DEM | RMSE (m)3.19 | 1 | |
| Building Footprint Estimation | Sentinel | -- | 1 | |
| Building Footprint Estimation | VHR satellite image | -- | 1 | |
| Building Height Estimation | LiDAR | -- | 1 | |
| Building Height Estimation | VHR SAR | -- | 1 | |
| Building Height Estimation | GIS only | -- | 1 | |
| Building Height Estimation | VHR satellite image | -- | 1 |