Virtual Keynote
in
Workshop: Dynamic Neural Networks
Spatially and Temporally Adaptive Neural Networks
Gao Huang
Discriminative features in an image or video usually correspond to only a subset of pixels or frames, while the remaining regions/intervals are less relevant to the task at hand. The prevalent deep learning approaches in computer vision, e.g., CNNs and Vision Transformers, are static models, which generally allocate an equal amount of computation to all the pixels/frames, leading to considerable redundancy. This talk will introduce dynamic neural networks that can spend computation unevenly both spatially and temporally. Two notable challenges in developing such models are that 1) the optimization becomes nondifferentiable; and 2) the inference stage may involve sparse computation which is practically inefficient. The talk will present effective and efficient approaches for developing spatially and temporally adaptive networks, and show their excellent performance on image and video recognition benchmarks.