Implicit Action Chunking for Smooth Continuous Control
Abstract
Reinforcement learning often produces high-frequency oscillatory control signals that undermine the safety and stability required for physical deployment. Explicit action chunking addresses this by predicting fixed-horizon trajectories but increases the policy output dimension to R^hd, leading to optimization difficulties and incompatibility with standard step-wise interaction. To overcome these challenges, this paper proposes Dual-Window Smoothing (DWS), an implicit action chunking framework for smooth continuous control. Unlike explicit methods, DWS enforces temporal coherence without expanding the action space. It uses a dual-window design: an execution window that ensures physical smoothness through deterministic modulation, and a value window that aligns temporal-difference targets over the horizon to correct critic bias caused by open-loop execution. DWS also includes a lightweight actor-side temporal regularizer based on first-order action differences to promote global continuity. This design effectively bridges the gap between temporal abstraction and reactive step-wise control. Experiments on benchmarks including the DeepMind Control Suite and industrial energy management tasks with vector states show that DWS outperforms state-of-the-art (SOTA) baselines such as LipsNet++ and SmODE. In complex vision-based autonomous driving tasks, DWS achieves smoother control, safer behavior with reduced jitter, and attains a 100% success rate.