INFER: Learning Implicit Neural Frequency Response Fields for Confined Acoustic Environments
Abstract
Neural acoustic fields often model time-domain impulse responses, which struggle to capture the frequency-selective wave behaviors that dominate confined, resonant environments. To address this, we propose INFER (Implicit Neural Frequency Response fields), a framework that directly learns continuous, complex-valued frequency response fields. Unlike prior time-domain methods, our frequency-first approach enables three key innovations: (1) end-to-end learning of frequency-specific attenuation and phase delay in 3D space; (2) a physics-based Kramers–Kronig consistency constraint that causally regularizes attenuation and phase delay; and (3) perceptual and hardware-aware spectral supervision that prioritizes critical auditory bands. We evaluate INFER across diverse settings, ranging from standard room-scale benchmarks (MeshRIR, RAF) to challenging, highly reverberant environments like real car cabins. Our approach significantly outperforms time- and hybrid-domain baselines, reducing average magnitude and phase reconstruction errors by over 39\% and 51\%, respectively, demonstrating state-of-the-art accuracy in modeling complex acoustic spaces.