Paper ID: 880
Title: Graying the black box: Understanding DQNs

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.):          The purpose of this work is to examine Deep Q-networks (DQNs)         in Atari, and interpret the policies learned based on SNE         maps. The domains for the experiments were 3 Atari games:         Breakout, Pacman and Seaquest.       

Clarity - Justification:  The paper is generally easy to read and nicely written. However, there are several missing details that make the approach a bit difficult to follow.  The first issue is the lack of description of t-SNEs. There appears to be an assumption that the clustered colors have an intuitive meaning, or that readers are familiar with this type of visualization. For example, Figure 2 is described as the t-SNE of Breakout. But, it does not explain what the t-SNE is saying. What is being visualized?  Further, what do the colors mean? You say that the maps are colored according to value function, etc., but do not say what colors are high and low. I assumed that more red was high, and more blue was low, but it was difficult to interpret. For those more used to this visualization technology, this might be simpler, but I have not used such maps before.       

Significance - Justification:         There are two ways that this paper can be a significant         contribution: as a technology for generally visualizing         learning and more specifically as scientific insight into         learning DQNs on Atari.                  It is definitely useful to be able to visualize learning. The         technology in this work, however, appears to be a mostly         straightforward application of t-SNEs. If this is not the         case, it would be better to more clearly explain the         visualization technology. Currently, t-SNEs are not explained,         nor any technical challenges in creating the visualization.          Therefore, I assume that the significance is related to the         scientific insight. Because this paper is         hard to follow in terms of the meanings of the t-SNEs, this         contribution is currently limited. Understanding DQN in         general with a visualization approach would be interesting,         outside of Atari. Right now, some of the conclusions are a bit         specific to Atari (e.g., reducing pixels as input, modifying         the padding with zeros); it would be interesting to have more         general conclusions about what the visualization         indicates. With both improved clarity and with slightly more         general conclusions that could be drawn (or a better discussion         of how such similar specific conclusions could be drawn in         other domains), then this paper would be much more         significant.        

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.):          Visualizing learned representations in reinforcement learning         is a useful direction. There are some interesting conclusions         in this work, and with some of the above mentioned comments,         could become a useful standard in RL for understanding         outcomes.          The main issue is in better explaining the technology. For         some of the figures, the clusters could be better labeled,         colors explained, etc., to make the visualization         understandable. One can read the conclusions based on your         expert understanding of the visualization, but it would be         useful to be able to interpret the visualization as a reader.          Further, there are a few design decisions that are         unclear. Why is PCA used as a pre-processing step? t-SNEs are         already a dimensionality reduction approach, why use a linear         dimensionality reduction approach as a pre-processing step?         This seems strange, and is definitely not what was done in the         original t-SNE paper.           Minor comments and typos:          038  : the use of TD-gammon here as an example is poorly         structured; it is not a function approximator, though the         sentence is structured that way. Also, it may not be the best         such example of a growing interest, being published in 1995.         184-185 : maximizes -> maximize         379  : annotaion -> annotation         381  : threw -> through         521,522 : surface -> surfaces         843  : maps -> map       

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The authors perform a t-SNE analysis of the hidden layer activations in a trained DQN for a couple of Atari 2600 games. 

Clarity - Justification: The paper is generally clear but would benefit from more careful writing. 

Significance - Justification: Although the analysis is interesting, it's not clear what impact it will have on the design of future algorithms.  Also how is the analysis different from the t-SNE analysis in the original DQN paper (Mnih et al. 2015)?  

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The in-depth analysis of DQN's hidden unit activations, and how it relates to the game-play, is interesting. I am a bit unsure how useful the analysis can be for the ICML community and how this qualifies for this conference since there is little technical content in the paper.  In the vanilla DQN, the rewards are clipped to be in the [-1,1] interval so the magnitude of rewards does not matter much. Is that the version of DQN that was run? If yes, then that should affect the analysis. For example this sentence about the bonus in seaquest: “In this cluster, the bonus box is visible indicating that the agent learned to separate this situation from others, however we can see that the cluster has a low value estimate indicating that the agent did not learn the right value function yet.”  “there seems to be a better way to model them, e.g., by setting the target to be zero if the next state is terminal.”  I did not understand that. Isn’t that already the case?  “for example we suggest to train an agent that does not receive those pixels as input” “One possibility is to learn a classifier from a states to clusters based on the t-SNE map and then learn a different control rule at each cluster.”  I would have been happier to see these predictions/suggestions actually implemented and tested. That would reinforce the case that such analysis can be useful to actually change the performance of DQN.     Minor comments/typos:  - Abstract: debug and optimize of deep neural networks  - it’s wide use - network predications. - bellman  needs capitalization - (decide how too show the gui and detail all measure we use) -> not sure this was meant to make it in the paper - Loosing -> losing  - “black annotaion box” - investigated the affect -> the effect   

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): A method to analyze policies learned by deep Q-Networks (DQNs) is presented. It consists in a 2D or 3D visualization of states obtained by applying t-SNE to activations in the last layer of the deep network estimating the Q value. Combined with state colorization based on Q value and various state attributes, it provides insight into the logic used by the network to solve the task. In particular some state clusters may be associated with more abstract actions ("options"), highlighting the hierarchical representation learned automatically by the network. The proposed methodology is demonstrated on three Atari 2600 games.

Clarity - Justification: Overall presentation is clear but sometimes it is a bit hard to follow (see below for details)

Significance - Justification: Given the current interest in DQN and variants, this is definitely an interesting topic to investigate. The approach presented here is well motivated, and results are convincing: we do gain a better understanding of the policies learned by DQN on these three games, which is useful to discover both strengths and weaknesses of the resulting agent.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): My main high level criticism is that it remains largely a manual process: it requires defining and extracting meaningful state features for colorization and visual analysis of game screenshots to understand the meaning of state clusters, and identify abstract strategies used by the agent. As we can see from the pictures, related states can be spread among multiple t-SNE clusters, whose boundaries may not always be obvious. I do believe this is still a useful tool to help analyze an agent's behavior, and it can speed up the process compared to watching lots of replays, but it is not (yet) a silver bullet (I guess this is why the title says "graying" the black box, and not "whitening" it!)  Smaller remarks: - A lot of English typos, please proofread - l.212 gamma is in the wrong place in equation - t-SNE is introduced in 3.2 whose title is "Deep Q Networks", should it be in a new section 3.3 or moved to 4.1? - l.280 "(value, Q, advantage)": what does this mean? - l.282 "(decide how too show the gui and detail all measure we use)": to be removed ;) - "The Jacobians image is presented above the input image itself": this is not clear when looking at Fig. 1 for two reasons: (1) it is called "Gradient image" in the UI, (2) in Fig. 1 we do not see red dots like in Fig. 6 for instance, so it is not clear where the gradient is - Switching to 3D t-SNE instead of 2D is not motivated in the paper (what does it bring?) and makes it harder to see what is going on (for instance in Fig. 3 we can't compare clusters between the left and right figures) - When coloring by estimated Q value (ex: Fig. 4) please provide the colormap, so that we can tell what are low and high values - In Fig. 4 it is not clear why the same cluster numbers are used for multiple clusters (ex: 1 & 2). Also the text says that clusters "1-3" correspond to high oxygen levels, which reads as "1 to 3", but cluster 2 can have low level as seen in Fig. 4 - Fig. 6 is hard to understand: personally I can't tell what is a diver and what is an enemy - Overall comment: it is difficult to follow the written analysis, constantly switching back and forth between text and images. It's obviously a hard problem, but the more figures can be on the same page as their analysis, the better (another approach would be to move some content into the captions). - Fig. 8 comes before Fig. 7, which is a bit weird. - 5.4 would be much more interesting if it was backed up with results or at least observations that would support those ideas. How can you tell that "initial states are (...) assigned with wrong value predictions"? Why would your suggestion for initial and terminal state representations work better? In Fig. 12 how do we know that the transition between sub-manifolds is caused by the score change and not the disappearance of one pinky object on the screen? (by the way it is also not clear what is "the outlined area")

=====