SURF: Separation via Unsupervised Remixing Flow
Henry Li ⋅ Robin Scheibler ⋅ Efthymios Tzinis ⋅ Matt Shannon ⋅ Arnaud Doucet ⋅ john hershey
Abstract
The goal of single-channel source separation is to reconstruct $K$ sources given their mixture. In supervised settings where vast amounts of clean source data are available, this challenging, ill-posed problem has been addressed successfully by generative diffusion and flow-based prior models. However, access to such clean source samples is often limited. To bridge this gap, we present Separation via Unsupervised Remixing Flow (\textbf{SURF}), an unsupervised flow matching approach for source separation that learns directly from observed mixtures. This method relies on a novel combination of state-of-the-art supervised flow matching and regression-based self-supervised techniques. At a high level, starting from a teacher model, we utilize a ``remixing'' step to bootstrap the learning of a student flow model from the teacher's estimates. We provide insights into the objectives optimized by this approach and draw a novel connection to the Wake-Sleep algorithm. Empirical evaluations on image and audio benchmarks demonstrate that \textbf{SURF} establishes a new state-of-the-art, significantly outperforming existing unsupervised methods.
Successful Page Load