Amodal Instance Segmentation with IRAIS Dataset for Sim-to-Real Transfer
Abstract
Amodal instance segmentation is hindered by the scarcity of scalable and transferable annotations. We introduce MaviGen, an automated 3D retail scene modeling and rendering framework that generates photorealistic multi-view images with complete amodal masks. Building on MaviGen, we present the IRAIS dataset, a sim-to-real benchmark comprising a large-scale synthetic multi-view set (3D-IRAIS) and a human-annotated real image set (Real-IRAIS), both sharing unified label definitions and evaluation protocols to facilitate rigorous transfer studies. We propose EUREKA, an encoder-only, query-efficient network for amodal instance segmentation that performs full-image multi-task inference via unified amodal/visible queries and dual mask heads. The dual heads enable mutual supervision between complete and visible masks, while the conditional masked self-attention mechanism further strengthens occlusion reasoning. Experiments establish strong baselines on IRAIS and achieve state-of-the-art performance on D2SA and COCOA-cls, demonstrating substantial improvements in sim-to-real transfer.