Session
Computer Vision 1
Deep Predictive Coding Network for Object Recognition
Haiguang Wen · Kuan Han · Junxing Shi · Yizhen Zhang · Eugenio Culurciello · Zhongming Liu
Based on the predictive coding theory in neuro- science, we designed a bi-directional and recur- rent neural net, namely deep predictive coding networks (PCN), that has feedforward, feedback, and recurrent connections. Feedback connections from a higher layer carry the prediction of its lower-layer representation; feedforward connec- tions carry the prediction errors to its higher-layer. Given image input, PCN runs recursive cycles of bottom-up and top-down computation to update its internal representations and reduce the differ- ence between bottom-up input and top-down pre- diction at every layer. After multiple cycles of recursive updating, the representation is used for image classification. With benchmark datasets (CIFAR-10/100, SVHN, and MNIST), PCN was found to always outperform its feedforward-only counterpart: a model without any mechanism for recurrent dynamics, and its performance tended to improve given more cycles of computation over time. In short, PCN reuses a single architecture to recursively run bottom-up and top-down pro- cesses to refine its representation towards more accurate and definitive object recognition.
Gradually Updated Neural Networks for Large-Scale Image Recognition
Siyuan Qiao · Zhishuai Zhang · Wei Shen · Bo Wang · Alan Yuille
Depth is one of the keys that make neural networks succeed in the task of large-scale image recognition. The state-of-the-art network architectures usually increase the depths by cascading convolutional layers or building blocks. In this paper, we present an alternative method to increase the depth. Our method is by introducing computation orderings to the channels within convolutional layers or blocks, based on which we gradually compute the outputs in a channel-wise manner. The added orderings not only increase the depths and the learning capacities of the networks without any additional computation costs, but also eliminate the overlap singularities so that the networks are able to converge faster and perform better. Experiments show that the networks based on our method achieve the state-of-the-art performances on CIFAR and ImageNet datasets.
Neural Inverse Rendering for General Reflectance Photometric Stereo
Tatsunori Taniai · Takanori Maehara
We present a novel convolutional neural network architecture for photometric stereo (Woodham, 1980), a problem of recovering 3D object surface normals from multiple images observed under varying illuminations. Despite its long history in computer vision, the problem still shows fundamental challenges for surfaces with unknown general reflectance properties (BRDFs). Leveraging deep neural networks to learn complicated reflectance models is promising, but studies in this direction are very limited due to difficulties in acquiring accurate ground truth for training and also in designing networks invariant to permutation of input images. In order to address these challenges, we propose a physics based unsupervised learning framework where surface normals and BRDFs are predicted by the network and fed into the rendering equation to synthesize observed images. The network weights are optimized during testing by minimizing reconstruction loss between observed and synthesized images. Thus, our learning process does not require ground truth normals or even pre-training on external images. Our method is shown to achieve the state-of-the-art performance on a challenging real-world scene benchmark.
One-Shot Segmentation in Clutter
Claudio Michaelis · Matthias Bethge · Alexander Ecker
We tackle the problem of one-shot segmentation: finding and segmenting a previously unseen object in a cluttered scene based on a single instruction example. We propose a novel dataset, which we call {\it cluttered Omniglot}. Using a baseline architecture combining a Siamese embedding for detection with a U-net for segmentation we show that increasing levels of clutter make the task progressively harder. Using oracle models with access to various amounts of ground-truth information, we evaluate different aspects of the problem and show that in this kind of visual search task, detection and segmentation are two intertwined problems, the solution to each of which helps solving the other. We therefore introduce {\it MaskNet}, an improved model that attends to multiple candidate locations, generates segmentation proposals to mask out background clutter and selects among the segmented objects. Our findings suggest that such image recognition models based on an iterative refinement of object detection and foreground segmentation may provide a way to deal with highly cluttered scenes.
Active Testing: An Efficient and Robust Framework for Estimating Accuracy
Phuc Nguyen · Deva Ramanan · Charless Fowlkes
Much recent work on large-scale visual recogni-tion aims to scale up learning to massive, noisily-annotated datasets. We address the problem ofscaling-up the evaluation of such models to large-scale datasets with noisy labels. Current protocolsfor doing so require a human user to either vet(re-annotate) a small fraction of the testset andignore the rest, or else correct errors in annotationas they are found through manual inspection ofresults. In this work, we re-formulate the problemas one of active testing, and examine strategiesfor efficiently querying a user so as to obtain anaccurate performance estimate with minimal vet-ting. We demonstrate the effectiveness of ourproposed active testing framework on estimatingtwo performance metrics, Precision@K and meanAverage Precisions, for two popular Computer Vi-sion tasks, multilabel classification and instancesegmentation, respectively. We further show thatour approach is able to siginificantly save humanannotation effort and more robust than alterna-tive evaluation protocols.