Sparse Relaxed-Lasso Steering: Automatic Sparse-Autoencoder Feature Selection for Precise Image Editing
Abstract
Precise, training-free editing of text-to-image diffusion models requires balancing alignment (faithful attribute manifestation), consistency (preserving non-target content), and quality (artifact-free textures). Sparse autoencoder (SAE) steering offers interpretable, smooth ``slider-like'' control by manipulating SAE feature activations derived from the text encoder; however, existing approaches rely on heuristic feature selection and manual tuning of the steering strength, leading to suboptimal trade-offs among the three objectives. We propose Sparse Relaxed-Lasso Steering (SRLS), which casts steering-vector discovery as a convex sparse recovery problem. Exploiting the affine structure of the SAE decoder, SRLS automatically identifies sparse, generalizable support sets via a Lasso objective, and then debiases the coefficients using support-restricted ridge regression. We further select the optimal steering strength using Bayesian optimization. Experiments across diverse attributes and subjects show that SRLS generally improves over other methods, yielding a better balance among alignment, consistency, and quality.