Paper ID: 880 Title: Graying the black box: Understanding DQNs Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The purpose of this work is to examine Deep Q-networks (DQNs) in Atari, and interpret the policies learned based on SNE maps. The domains for the experiments were 3 Atari games: Breakout, Pacman and Seaquest. Clarity - Justification: The paper is generally easy to read and nicely written. However, there are several missing details that make the approach a bit difficult to follow. The first issue is the lack of description of t-SNEs. There appears to be an assumption that the clustered colors have an intuitive meaning, or that readers are familiar with this type of visualization. For example, Figure 2 is described as the t-SNE of Breakout. But, it does not explain what the t-SNE is saying. What is being visualized? Further, what do the colors mean? You say that the maps are colored according to value function, etc., but do not say what colors are high and low. I assumed that more red was high, and more blue was low, but it was difficult to interpret. For those more used to this visualization technology, this might be simpler, but I have not used such maps before. Significance - Justification: There are two ways that this paper can be a significant contribution: as a technology for generally visualizing learning and more specifically as scientific insight into learning DQNs on Atari. It is definitely useful to be able to visualize learning. The technology in this work, however, appears to be a mostly straightforward application of t-SNEs. If this is not the case, it would be better to more clearly explain the visualization technology. Currently, t-SNEs are not explained, nor any technical challenges in creating the visualization. Therefore, I assume that the significance is related to the scientific insight. Because this paper is hard to follow in terms of the meanings of the t-SNEs, this contribution is currently limited. Understanding DQN in general with a visualization approach would be interesting, outside of Atari. Right now, some of the conclusions are a bit specific to Atari (e.g., reducing pixels as input, modifying the padding with zeros); it would be interesting to have more general conclusions about what the visualization indicates. With both improved clarity and with slightly more general conclusions that could be drawn (or a better discussion of how such similar specific conclusions could be drawn in other domains), then this paper would be much more significant. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Visualizing learned representations in reinforcement learning is a useful direction. There are some interesting conclusions in this work, and with some of the above mentioned comments, could become a useful standard in RL for understanding outcomes. The main issue is in better explaining the technology. For some of the figures, the clusters could be better labeled, colors explained, etc., to make the visualization understandable. One can read the conclusions based on your expert understanding of the visualization, but it would be useful to be able to interpret the visualization as a reader. Further, there are a few design decisions that are unclear. Why is PCA used as a pre-processing step? t-SNEs are already a dimensionality reduction approach, why use a linear dimensionality reduction approach as a pre-processing step? This seems strange, and is definitely not what was done in the original t-SNE paper. Minor comments and typos: 038 : the use of TD-gammon here as an example is poorly structured; it is not a function approximator, though the sentence is structured that way. Also, it may not be the best such example of a growing interest, being published in 1995. 184-185 : maximizes -> maximize 379 : annotaion -> annotation 381 : threw -> through 521,522 : surface -> surfaces 843 : maps -> map ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors perform a t-SNE analysis of the hidden layer activations in a trained DQN for a couple of Atari 2600 games. Clarity - Justification: The paper is generally clear but would benefit from more careful writing. Significance - Justification: Although the analysis is interesting, it's not clear what impact it will have on the design of future algorithms. Also how is the analysis different from the t-SNE analysis in the original DQN paper (Mnih et al. 2015)? Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The in-depth analysis of DQN's hidden unit activations, and how it relates to the game-play, is interesting. I am a bit unsure how useful the analysis can be for the ICML community and how this qualifies for this conference since there is little technical content in the paper. In the vanilla DQN, the rewards are clipped to be in the [-1,1] interval so the magnitude of rewards does not matter much. Is that the version of DQN that was run? If yes, then that should affect the analysis. For example this sentence about the bonus in seaquest: “In this cluster, the bonus box is visible indicating that the agent learned to separate this situation from others, however we can see that the cluster has a low value estimate indicating that the agent did not learn the right value function yet.” “there seems to be a better way to model them, e.g., by setting the target to be zero if the next state is terminal.” I did not understand that. Isn’t that already the case? “for example we suggest to train an agent that does not receive those pixels as input” “One possibility is to learn a classifier from a states to clusters based on the t-SNE map and then learn a different control rule at each cluster.” I would have been happier to see these predictions/suggestions actually implemented and tested. That would reinforce the case that such analysis can be useful to actually change the performance of DQN. Minor comments/typos: - Abstract: debug and optimize of deep neural networks - it’s wide use - network predications. - bellman needs capitalization - (decide how too show the gui and detail all measure we use) -> not sure this was meant to make it in the paper - Loosing -> losing - “black annotaion box” - investigated the affect -> the effect ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): A method to analyze policies learned by deep Q-Networks (DQNs) is presented. It consists in a 2D or 3D visualization of states obtained by applying t-SNE to activations in the last layer of the deep network estimating the Q value. Combined with state colorization based on Q value and various state attributes, it provides insight into the logic used by the network to solve the task. In particular some state clusters may be associated with more abstract actions ("options"), highlighting the hierarchical representation learned automatically by the network. The proposed methodology is demonstrated on three Atari 2600 games. Clarity - Justification: Overall presentation is clear but sometimes it is a bit hard to follow (see below for details) Significance - Justification: Given the current interest in DQN and variants, this is definitely an interesting topic to investigate. The approach presented here is well motivated, and results are convincing: we do gain a better understanding of the policies learned by DQN on these three games, which is useful to discover both strengths and weaknesses of the resulting agent. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): My main high level criticism is that it remains largely a manual process: it requires defining and extracting meaningful state features for colorization and visual analysis of game screenshots to understand the meaning of state clusters, and identify abstract strategies used by the agent. As we can see from the pictures, related states can be spread among multiple t-SNE clusters, whose boundaries may not always be obvious. I do believe this is still a useful tool to help analyze an agent's behavior, and it can speed up the process compared to watching lots of replays, but it is not (yet) a silver bullet (I guess this is why the title says "graying" the black box, and not "whitening" it!) Smaller remarks: - A lot of English typos, please proofread - l.212 gamma is in the wrong place in equation - t-SNE is introduced in 3.2 whose title is "Deep Q Networks", should it be in a new section 3.3 or moved to 4.1? - l.280 "(value, Q, advantage)": what does this mean? - l.282 "(decide how too show the gui and detail all measure we use)": to be removed ;) - "The Jacobians image is presented above the input image itself": this is not clear when looking at Fig. 1 for two reasons: (1) it is called "Gradient image" in the UI, (2) in Fig. 1 we do not see red dots like in Fig. 6 for instance, so it is not clear where the gradient is - Switching to 3D t-SNE instead of 2D is not motivated in the paper (what does it bring?) and makes it harder to see what is going on (for instance in Fig. 3 we can't compare clusters between the left and right figures) - When coloring by estimated Q value (ex: Fig. 4) please provide the colormap, so that we can tell what are low and high values - In Fig. 4 it is not clear why the same cluster numbers are used for multiple clusters (ex: 1 & 2). Also the text says that clusters "1-3" correspond to high oxygen levels, which reads as "1 to 3", but cluster 2 can have low level as seen in Fig. 4 - Fig. 6 is hard to understand: personally I can't tell what is a diver and what is an enemy - Overall comment: it is difficult to follow the written analysis, constantly switching back and forth between text and images. It's obviously a hard problem, but the more figures can be on the same page as their analysis, the better (another approach would be to move some content into the captions). - Fig. 8 comes before Fig. 7, which is a bit weird. - 5.4 would be much more interesting if it was backed up with results or at least observations that would support those ideas. How can you tell that "initial states are (...) assigned with wrong value predictions"? Why would your suggestion for initial and terminal state representations work better? In Fig. 12 how do we know that the transition between sub-manifolds is caused by the score change and not the disappearance of one pinky object on the screen? (by the way it is also not clear what is "the outlined area") =====