In this talk, I will present several simple ideas that were proposed a long time ago to deal with extremely large output spaces in the language modeling. These include various types of hierarchical softmax, and other approaches that decompose the labels into smaller parts such as sub-word language modeling.