Faster Activation Functions at the Edge for Post-Training Speedups
Abstract
On-device AI has gained significant attention for enabling efficient, low-latency inference on edge devices. However, tight resource constraints on these platforms make the deployment of accurate and lightweight deep learning models challenging. In particular, advanced activation functions (AFs) like Swish and GELU often incur high inference overhead due to the lack of hardware fast-paths for exponentiation and division, restricting edge-ML applications to simple AFs like ReLU, limiting model accuracy. To address this, we propose FFCC, a compiler that automatically generates efficient approximations of AFs through floating-point reinterpretation. These functions don’t require hardware fast-paths meaning they remain fast on edge devices. They do not incur great accurate losses, and allowing use as post-training replacements without negatively impacting model final accuracy. FFCC takes a specification of AFs using basic floating-point operators and applies derivation rules to lower these expressions into efficient instruction sequences. Our experiments show that we can provide fast approximations of AFs, achieving order-of-magnitude speed ups over accurate baselines on Arm M7, delivering performance on-par with Hardswish, while beating it on accuracy. Additionally, we show that our approximations – unlike Hardswish – can be used as drop-in replacements of exact version post-training without loss of model accuracy.