E²I-VRWKV: Explicit EPI-Representation and Interaction-Aware Vision-RWKV for Light Field Semantic Segmentation
Abstract
Pixel-level semantic segmentation of 4D light field (LF) data remains a considerable challenge, primarily due to the conflict between modeling complex spatial-angular dependencies and maintaining linear computational efficiency. Current linear models like VRWKV offer scalability but often fail to capture intrinsic geometric structures, leading to the structural collapse of Epipolar Plane Image (EPI) cues. To overcome these limitations, we propose E²I-VRWKV, an EPI-Enhanced and Interaction-aware network that generates high-quality segmentation maps by embedding explicit geometric priors into a linear-complexity backbone. Specifically, we introduce the Light Field Epipolar-Aware Cross-Modal Attention (LF-ECMA) block. The key innovation lies in the integration of an EPI Geometric Prior Generator, which explicitly extracts disparity-sensitive biases to enforce geometric consistency, and a Geometric-Context Gating (GC-Gate) mechanism. This mechanism functions as a geometrically modulated aperture to dynamically calibrate the fusion of spatial and angular manifolds. Experiments on the UrbanLF benchmark demonstrate that our method outperforms other state-of-the-art (SOTA) methods, achieving 86.55% mIoU on UrbanLF-Real while maintaining a superior balance between accuracy and linear efficiency.