Propagating gradients through differentiable simulators allows to improve the training of deep learning architectures. We study an example from quantum physics that, at first glance, seems not to benefit from such gradients. Our analysis shows the problem is rooted in a mismatch between the specific form of loss functions used in quantum physics and its gradients; the gradient can vanish for non-equal states. We propose to add a scaling term to fix this problematic gradient flow and regain the benefits of gradient-based optimization. We chose two experiments on the Schroedinger equation, a prediction and a control task, to demonstrate the potential of our method.