Skip to yearly menu bar Skip to main content


Poster

Receptive Fields As Experts in Vision Architectures

Dongze Lian · Weihao Yu · Xinchao Wang


Abstract: The size of spatial receptive fields, from the early 3$\times$3 convolutions in VGGNet to the recent 7$\times$7 convolutions in ConvNeXt, has always played a critical role in architecture design. In this paper, we propose a Mixture of Receptive Fields (MoRF) instead of using a single receptive field. MoRF contains the combinations of multiple receptive fields with different sizes, e.g., convolutions with different kernel sizes, which can be regarded as experts. Such an approach serves two functions: one is to select the appropriate receptive field according to the input, and the other is to expand the network capacity. Furthermore, we also introduce two types of routing mechanisms, hard routing and soft routing to automatically select the appropriate receptive field experts. In the inference stage, the selected receptive field experts can be merged via re-parameterization to maintain a similar inference speed compared to the single receptive field. To demonstrate the effectiveness of MoRF, we integrate the MoRF concept into multiple architectures, e.g., ResNet and ConvNeXt. Extensive experiments show that our approach outperforms the baselines in image classification, object detection, and segmentation tasks without significantly increasing the inference time.

Live content is unavailable. Log in and register to view live content