Interactive Segmentation with Elaborate Focus Prior
Abstract
Regional refinement for interactive segmentation is of great necessity to ensure the fidelity of segmented pixels nearby user-prompted locations, which specifies a local window (\ie, focus view) for the latest click after a global prediction, where local pixels are revisited and optimized using numerous refining structures. Previous methods either utilize a two-stage pipeline to estimate the focus view or manually preset a fixed scope for all clicks, while the former is time-consuming, the latter fails to capture the correlation among click position, object geometry, and focus intensity. In this paper, we inherit the core idea of FCFI \cite{wei2023focused} and dedicate a one-stage framework characterized with \textbf{E}laborate \textbf{F}ocus \textbf{P}rior (EFPNet). Concretely, EFPNet outputs an erroneous mask \wrt historical feedback and newly-placed click in an end-to-end manner, which deduces precise focus region according to its max-connected component, followed with feedback correction considering image, feature and mask affinity. We further design a clicked-with-focus mechanism for efficient feedback integration. Extensive studies on four benchmarks have revealed outstanding performance of EFPNet for both efficacy and computational overhead.