Mitigating Bias in Locally Constrained Decoding via Tractable Proposals
Meihua Dang ⋅ Linxin Song ⋅ Honghua Zhang ⋅ Jieyu Zhao ⋅ Guy Van den Broeck ⋅ Stefano Ermon
Abstract
Generations from large language models often fail to reliably conform to logical constraints such as JSON schema. Existing locally-constrained decoding (LCD) approaches enforce constraints by myopically masking out next tokens, resulting in biased sampling and degradation in downstream performance. Recent work introduces sequential Monte Carlo (SMC) methods to mitigate such sampling biases, but designing effective proposal distributions or potential functions remains a key challenge. In this work, we propose a generic approach to construct proposals and potentials for SMC sampling from $p_{\texttt{lm}}( \cdot \mid \texttt{constraint})$. First, we show that constraints specified as finite automata (FA) can be tensorized for efficient execution on GPUs, which we use to construct *globally-constrained decoding* (GCD) proposals. In addition, leveraging the fact that a tensorized FA shares the same *circuit structure* as hidden Markov models (HMM), we circuit-multiply it with an HMM to obtain the *probabilistic GCD* (P-GCD) proposal that encodes both logical and probabilistic information about the target distribution $p_{\texttt{lm}}( \cdot \mid \texttt{constraint})$. We evaluate (P-)GCD on xLAM, a widely adopted function-calling dataset, and on CommonGen, a keyword-based constrained generation benchmark. Experiments show that compared to LCD proposals, under the same SMC sampling setup, (P-)GCD achieve faster convergence to the target distribution with significantly fewer particles.
Successful Page Load