Skip to content

Knowledge-Enhanced Visual Grounding for remote sensing object location

Notifications You must be signed in to change notification settings

WHULuoJiaTeam/KEVG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

From Object to Context: Scene Knowledge Enhanced Visual Grounding for Geospatial Understanding

This is the offical repo for paper "From Object to Context: Scene Knowledge Enhanced Visual Grounding for Geospatial Understanding". The dataset and code are coming soon.

Table of Contents

Overview

image

Remote Sensing Visual Grounding (RSVG) is a critical task aimed at precise object localization in remote sensing images using language expressions. Existing methods align visual and textual features through cross-modal fusion but often fail to capture object dependencies, hindering complex visual reasoning about relationships and contexts. To address this, we introduce the Luojia-VG dataset, a benchmark enhancing visual reasoning with scene knowledge, including object-level annotations and contextual descriptions of relationships, functions, and activities. Unlike previous datasets focusing on basic object descriptions, Luojia-VG bridges the semantic gap between referring expressions and detailed visual content. Furthermore, we propose Knowledge-Enhanced Visual Grounding (KEVG), a novel model that combines scene knowledge with visual features and textual queries. KEVG contains two key components: the Deep Knowledge Fusion (DKF) module and the Query-Region Alignment (QRA) module. The DKF module progressively embeds scene knowledge into multi-scale visual features via cross-attention, enhancing the model's fine-grained understanding of scene contexts. The QRA module aligns image regions with the query by concentrating on the most contextually relevant areas for precise localization. Experiments demonstrate KEVG achieves state-of-the-art performance, with Pr@0.5 scores of 82.31% on DIOR-RSVG and 83.29% on Luojia-VG.

Luojia-VG Dataset

image image

About

Knowledge-Enhanced Visual Grounding for remote sensing object location

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors