Motion-Residual Conflict-Aware Time Reversal for Generative Inbetweening
Abstract
Image-to-video (I2V) diffusion models have recently made generative inbetweening a practical reality by synthesizing semantically plausible intermediate frames between two keyframes. Among them, inference-time sampling schemes that re-use large pre-trained I2V backbones without any additional training are especially attractive. Yet current methods frequently exhibit temporal inconsistency and artifacts such as ghosting or reverse motion. A key reason is that the two trajectories are driven by distinct motion priors, each inherited from its own conditioning frame, and are simply stitched together without explicitly reconciling these priors. We introduce Motion-Residual Conflict-Aware Time Reversal (MR‑CATR), an inference-time sampling framework that aligns conflicting motion priors instead of discarding one of them or collapsing to a single start-conditioned prior. MR‑CATR first derives a motion-residual–based direction from the forward path, combined with an end-conditioned residual to form a consensus motion axis. This design suppresses bidirectional motion conflicts while still allowing end-frame information to refine the trajectory and enforce endpoint consistency. MR‑CATR can be seamlessly integrated into existing time-reversal samplers without changing model parameters. Experiments on generative inbetweening benchmarks show that our method produces videos with smoother motion, fewer artifacts, and consistently better quantitative scores and user preferences than prior strategies.