Shadow-Robust Interactive Whiteboard via YUV Preprocessing and MediaPipe Hand Tracking
Abstract
We present an implemented, end-to-end interactive whiteboard system that operates on any laptop webcam without auxiliary hardware. The system addresses the critical barrier of visual noise (shadows and reflections) through a three-layer architecture: (1) a Shadow Layer that converts RGB frames to YUV and applies adaptive V-channel filtering, (2) a Landmark Layer that passes the denoised image to MediaPipe Hands to extract 21 skeletal keypoints, and (3) an Interactive Layer that maps fingertip coordinates and pinch-distance gestures to draw, click, and drag actions. Experiments with 20 participants across three lighting conditions show a 68.6% reduction in landmark jitter (6.82 → 2.14 px), 96.6% gesture accuracy, and a sustained 24.5 FPS on a commodity CPU. An ablation study confirms that removing the Shadow Layer degrades jitter by 3.18× and drops gesture accuracy to 81%. The open-source implementation bridges the digital divide in resource-constrained schools across South and Southeast Asia.