Interpretability is important in text generation for guiding the generation with interpretable attributes. Variational auto-encoder (VAE) with Gaussian distribution as prior has been successfully applied in text generation, but it is hard to interpret the meaning of the latent variable. To enhance the controllability and interpretability, one can replace the Gaussian prior with a mixture of Gaussian distributions (GM-VAE), whose mixture components could be related to some latent attributes of data. Unfortunately, straightforward variational training of GM-VAE leads the mode-collapse problem. In this paper, we find that mode-collapse is a general problem for VAEs with exponential family mixture priors. We propose DEM-VAE, which introduces an extra dispersion term to induce a well-structured latent space. Experimental results show that our approach does obtain a well structured latent space, with which our method outperforms strong baselines in interpretable text generation benchmarks.