Oral
in
Affinity Workshop: LatinX in AI (LXAI) Research Workshop
SwiftFaceFormer: An Efficient and Lightweight Hybrid Architecture for Accurate Face Recognition Applications
Luis Santiago Luévano García · Yoanna Martínez-Díaz · Heydi Mendez-Vazquez · Miguel Gonzalez-Mendoza · Davide Frey
Keywords: [ Efficient Face Transformer ] [ Efficient Vision Transformer ] [ Lightweight Face Recognition ] [ knowledge distillation ]
With the growing breakthrough of deep learning-based face recognition, the development of lightweight models that achieve high accuracy while maintaining computational and memory efficiency has become paramount, especially for deployment on embedded domains. While Vision Transformers have shown significant promise results in various computer vision tasks, their adaptability to resource-constrained devices remains a significant challenge. This paper introduces SwiftFaceFormer, a new efficient, and lightweight family of face recognition models inspired by the hybrid SwiftFormer architecture. Our proposal not only retains the representational capacity of its predecessor but also introduces efficiency improvements, enabling enhanced face recognition performance at a fraction of the computational cost. We also propose to enhance the verification performance of our original most lightweight variant by using a training paradigm based on Knowledge Distillation. Through extensive experiments on several face benchmarks, the presented SwiftFaceFormer demonstrates high levels of accuracy compared to the original SwiftFormer model, and very competitive results to state-of-the-art deep face recognition models, providing a suitable solution for real-time, on-device face recognition applications.