We consider the challenge of building feature detectors for high-level concepts from only unlabeled data. For example, we would like to understand if it is possible to learn a face detector using only unlabeled images downloaded from the Internet. To answer this question, we trained a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (which has 10 million images, each image has 200x200 pixels). On contrary to what appears to be a widely-held negative belief, our experimental results reveal that it is possible to achieve a face detector via only unlabeled data. Control experiments show that the feature detector is robust not only to translation but also to scaling and 3D rotation. Also via recognition and visualization, we find that the same network is sensitive to other high-level concepts such as cat faces and human bodies.