Poster Mon, Jul 6, 2026 • 10:00 PM – 11:45 PM PDT HALL A #601

Unlocking Cross-Modal Biosignal Synthesis: A Temporally-Aware VAE-Diffusion Model

Chenyang Xu ⋅ Dezhen Wang ⋅ Hao Wang

Abstract

Synthesizing authentic phonocardiograms (PCG) from ubiquitous electrocardiograms (ECG) is a critical task for accessible cardiac monitoring. Existing generative models, however, struggle to capture the heart's complex electromechanical coupling, failing to meet the dual requirements of temporal precision and physiological fidelity essential for clinical diagnosis. We introduce the Temporally-Aware VAE-Diffusion model, a synergistic hybrid architecture that resolves this trade-off. Our architecture enforces tight physiological coupling through an Enhanced Condition Fusion mechanism and explicitly models long-range cardiac dynamics via Temporal Attention Blocks. On the EPHNOGRAM benchmark, our model sets a new state-of-the-art, achieving a Pearson correlation of 0.910$\pm$0.008, 95.95\% S1 detection accuracy, and a precise 12.0 ms timing error, significantly outperforming leading diffusion and Transformer baselines.Crucially, our work presents a rigorous demonstration of successful zero-shot generalization for this task. evaluated on the unseen PhysioNet/CinC 2016 dataset, our model maintains high fidelity even on challenging pathological recordings, establishing a validated foundation for robust, accessible cardiovascular diagnostics.