Deep reinforcement learning algorithms require large amounts of experience to learn an individual task. While in principle meta-reinforcement learning (meta-RL) algorithms enable agents to learn new skills from small amounts of experience, several major challenges preclude their practicality. Current methods rely heavily on on-policy experience, limiting their sample efficiency, and lack mechanisms to reason about task uncertainty when identifying and learning new tasks, limiting their effectiveness in sparse reward problems. In this paper, we aim to address these challenges by developing an off-policy meta-RL algorithm based on online latent task inference. Our method can be interpreted as an implementation of online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience. This probabilistic interpretation also enables posterior sampling for structured exploration. Our method outperforms prior algorithms in asymptotic performance and sample efficiency on several meta-RL benchmarks.