We present Marvel, a verification-based reinforcement learning framework that synthesizes safe programmatic controllers for environments with continuous state and action space. The key idea is the integration of program reasoning techniques into reinforcement learning training loops. Marvel performs abstraction-based program verification to reason about a programmatic controller and its environment as a closed-loop system. Based on a novel verification-guided synthesis loop for training, Marvel minimizes the amount of safety violation in the system abstraction, which approximates the worst-case safety loss, using gradient-descent style optimization. Experimental results demonstrate the substantial benefits of leveraging verification feedback for safe-by-construction of programmatic controllers.