I will discuss the task of executing natural language instructions with a physical robotic agent. In contrast to existing work, we do not engineer formal representations of language meaning or the robot environment. Instead, we learn to directly map raw observations and language to low-level continuous control of a quadcopter drone. We use an interpretable neural network model that mixes learned representations with differentiable geometric operations. For training, we introduce Supervised and Reinforcement Asynchronous Learning (SuReAL), a learning algorithm that utilizes supervised and reinforcement learning processes that constantly interact to learn robust reasoning with limited data. Our learning algorithm uses demonstrations and a plan-following intrinsic reward signal. While we do not require any real-world autonomous flight during learning, our model works effectively both in simulation and the real environment.