Poster
in
Workshop: ES-FoMo: Efficient Systems for Foundation Models
Dissecting Efficient Architectures for Wake-Word Detection
Cody Berger · Juncheng Li · Yiyuan Li · Aaron Berger · Dmitri Berger · Karthik Ganesan · Emma Strubell · Florian Metze
Wake-word detection models running on edge devices have stringent efficiency requirements.We observe that the over-the-air test accuracy of trained models on parallel devices (GPU/TPU) usually degrades when deployed on edge devices using a CPU for over-the-air, real-time evaluation.Further, the differing inference time when migrating between GPU and CPU varies across models.This drop is due to hardware latency and acoustic impulse response, while the non-uniform expansion of inference time results from varying exploitation of hardware acceleration by architectures.Although many neural architectures have been applied to wake-word detection tasks, such latency or accuracy drops have not been studied at granular, layer matrix multiplication levels.In this paper, we compare five Convolutional Neural Network (CNN) architectures and one pure Transformer architecture optimized for edge deployment, train them for wake-word detection on the Speech Commands dataset, and quantize two representative models. We seek to quantify their accuracy-efficiency tradeoffs to inform researchers and practitioners about the key components in models that influence this tradeoff.