Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Enhancing Geo-localization for Crowdsourced Flood Imagery via LLM-Guided Attention

About

Crowdsourced social media imagery provides real-time visual evidence of urban flooding but often lacks reliable geographic metadata for emergency response. Existing Visual Place Recognition (VPR) models struggle to geo-localize these images due to cross-source domain shifts and visual distortions. We present VPR-AttLLM, a model-agnostic framework integrating the semantic reasoning and geospatial knowledge of Large Language Models (LLMs) into VPR pipelines via attention-guided descriptor enhancement. VPR-AttLLM uses LLMs to isolate location-informative regions and suppress transient noise, improving retrieval without model retraining or new data. We evaluate this framework across San Francisco and Hong Kong using established queries, synthetic flooding scenarios, and real social media flood images. Integrating VPR-AttLLM with state-of-the-art models (CosPlace, EigenPlaces, SALAD) consistently improves recall, yielding 1-3% relative gains and up to 8% on challenging real flood imagery. By embedding urban perception principles into attention mechanisms, VPR-AttLLM bridges human-like spatial reasoning with modern VPR architectures. Its plug-and-play design and cross-source robustness offer a scalable solution for rapid geo-localization of crowdsourced crisis imagery, advancing cognitive urban resilience.

Fengyi Xu, Jun Ma, Waishan Qiu, Cui Guo, Jack C.P. Cheng• 2025

Related benchmarks

TaskDatasetResultRank
Visual Place Recognitionsf v1
Recall@188.8
39
Visual Place Recognitionsf_mapi
Recall@185.6
39
Visual Place Recognitionhk_mapi
Recall@176.2
39
Visual Place Recognitionsf_flood
Recall@164
24
Visual Place Recognitionhk_flood
Recall@160
24
Showing 5 of 5 rows

Other info

Follow for update