Enhancing Geo-localization for Crowdsourced Flood Imagery via LLM-Guided Attention

About

Crowdsourced social media imagery provides real-time visual evidence of urban flooding but often lacks reliable geographic metadata for emergency response. Existing Visual Place Recognition (VPR) models struggle to geo-localize these images due to cross-source domain shifts and visual distortions. We present VPR-AttLLM, a model-agnostic framework integrating the semantic reasoning and geospatial knowledge of Large Language Models (LLMs) into VPR pipelines via attention-guided descriptor enhancement. VPR-AttLLM uses LLMs to isolate location-informative regions and suppress transient noise, improving retrieval without model retraining or new data. We evaluate this framework across San Francisco and Hong Kong using established queries, synthetic flooding scenarios, and real social media flood images. Integrating VPR-AttLLM with state-of-the-art models (CosPlace, EigenPlaces, SALAD) consistently improves recall, yielding 1-3% relative gains and up to 8% on challenging real flood imagery. By embedding urban perception principles into attention mechanisms, VPR-AttLLM bridges human-like spatial reasoning with modern VPR architectures. Its plug-and-play design and cross-source robustness offer a scalable solution for rapid geo-localization of crowdsourced crisis imagery, advancing cognitive urban resilience.

Fengyi Xu, Jun Ma, Waishan Qiu, Cui Guo, Jack C.P. Cheng• 2025

Related benchmarks

Task	Dataset	Result
Visual Place Recognition	sf v1	Recall@188.8	39
Visual Place Recognition	sf_mapi	Recall@185.6	39
Visual Place Recognition	hk_mapi	Recall@176.2	39
Visual Place Recognition	sf_flood	Recall@164	24
Visual Place Recognition	hk_flood	Recall@160	24

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord