Bridging Fixed and Open-Vocabulary Segmentation in Remote Sensing via AI Orchestration
Abstract
Remote sensing semantic segmentation models are typically designed for fixed label spaces, limiting their applicability in real-world scenarios where users may issue diverse and open-ended queries. While recent open-vocabulary vision-language models enable flexible querying, they often underperform compared to task-specific models in high-resolution remote sensing settings. In this work, we propose a modular Artificial Intelligence (AI) system that bridges this gap by integrating natural language understanding, dynamic task orchestration, and hybrid segmentation agents. The system employs a language model-based input parser to interpret user prompts, an orchestrator to route tasks, and a combination of specialized segmentation models and an open-vocabulary agent to generate outputs. This design enables both high accuracy for known tasks and flexibility for unseen queries within a unified framework. Our experiments demonstrate the effectiveness of the proposed AI system/framework.