MolAlign3D: Enhancing Fixed-Dimensional E(3)-Equivariant Latent Space for High-Fidelity 3D Molecular Reconstruction and Editing
Abstract
Recent advances in 3D molecular modeling have achieved high-fidelity structural synthesis, yet these models often lack an explicit and manipulable representation space. To address this, MolFLAE introduced a fixed-dimensional, E(3)-equivariant latent space, providing a novel framework for molecular editing independent of atom counts. However, because its latent space was primarily optimized for geometric reconstruction, it remains semantically shallow and inadequate for comprehensive representation learning. In this work, we propose MolAlign3D, which evolves this architecture into a unified semantic-generative engine. By anchoring MolFLAE’s manipulable latents with embeddings from a pre-trained molecular encoder, we yield a manifold that is both semantically dense and geometrically precise. Experiments show that MolAlign3D achieves high-fidelity molecular reconstruction and attains comparable performance on molecular property prediction benchmarks. Notably, the integration of rich semantic priors significantly enhances zero-shot molecular manipulation, including atom-number editing and latent-space interpolation, outperforming prior fixed-dimensional equivariant latent baseline.