Identifiable Nonlinear Differentiable Causal Discovery via Independence and Adaptive Group Sparsity
Abstract
Differentiable approaches to causal discovery have shown promise in learning DAG structures via continuous optimization, but their theoretical guarantees are largely restricted to models with homoscedastic noise or known noise distribution. In particular, existing methods based on mean squared error fail to identify the true DAG when noise distributions are non-Gaussian and vary in scale. In this paper, we address this gap in nonlinear additive noise models (ANMs) with arbitrary noise. Our approach extends NOTIME (Berrevoets et al. 2025) which minimizes an independence criterion among the residuals. We show that the global minimizer of the independence criterion corresponds to the true underlying DAG up to additional constant edges in general ANMs. To recover the exact structure, we introduce an adaptive group lasso penalty that regularizes entire columns of the first-layer weight matrix of an MLP, enabling the selective pruning of constant edges in a functionally meaningful way. Empirically, our method exhibits effective and stable performance across diverse noise types and variances, outperforming prior methods that lack identifiability guarantees in this setting.