Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Next Generation of Sequence Modeling Architectures

Multi-Task Instruction Training of Text Diffusion Models

Changyou Chen · Gargi Balasubramaniam · Rui Meng · Han Zhao · Bunyamin Sisman · qingjun cui


Abstract:

Recent advancements in autoregressive language models (LMs) have demonstrated remarkable adaptability across diverse tasks, excelling in both discriminative and generative domains with impressive multitasking capabilities. This work focuses on the non-autoregressive counterparts, leveraging diffusion based denoising generation for sequence-to-sequence modeling. The extent to which current text diffusion-based LMs can handle multitasking remains unclear. In this study, we introduce a novel framework tailored to designing a diffusion model for multi-task language modeling. Inspired by latent image diffusion models, our approach involves a general transformer-based diffusion model leveraging pretrained encoders, facilitating multi-task learning with adaptable input embedding encoders. We define a diffusion loss within the trainable decoder's latent space, which interacts with any encoder via a cross-attention mechanism. This framework establishes a flexible non-autoregressive LM capable of handling potentially noisy data by leveraging robust instruction embeddings from encoders, enabling instruction tuning. We demonstrate the efficacy of our model across various setups, including single-task and multi-task scenarios, showing its ability to produce high-quality outputs by effectively utilizing and merging training task information in the continuous latent space.

Chat is not available.