In the age of Internet of Things (IoT), embedded devices ranging from ARM Cortex M0s with 100s of KB of RAM to Arduinos with 2KB RAM are expected to perform increasingly intelligent classification tasks, such as voice and gesture recognition, activity tracking, and biometric security. While convolutional neural networks (CNNs), together with spectrogram preprocessing, are a natural solution to many of these classification tasks, storage of the network's activations often exceeds the hard memory constraints of embedded platforms. This paper presents memory-optimal direct convolutions as a way to push classification accuracy as high as possible given strict hardware memory constraints at the expense of extra compute, exploring the opposite end of the compute-memory trade-off curve from standard approaches that minimize latency at the expense of extra memory. We evaluate classification accuracy across a variety of small image and time series datasets employing memory-optimal CNNs and memory-efficient spectrogram preprocessing. We also validate the memory-optimal CNN technique with an Arduino implementation of the 10-class MNIST classification task, fitting the network specification, weights, and activations entirely within 2KB SRAM and achieving a state-of-the-art classification accuracy for small-scale embedded systems of 99.15%.
Albert Gural (Stanford University)
Boris Murmann (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
2019 Poster: Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Applications »
Wed Jun 12th 01:30 -- 04:00 AM Room Pacific Ballroom