We thank all the reviewers for their time and constructive feedback on the paper.$
Regarding Reviewer 1's comments:
We agree that computational costs of different backup strategies are important, and  should be included as part of the discussion in the paper.  In practice, we find that the difference in computational cost among the backup strategies (MaxMCTS_gamma, MaxMCTS(lambda), Pure Monte Carlo) is less than 10%, as graphed here: http://i.imgur.com/bx56Inm.png For those experiments, all the code is implemented in C++, and care has been taken to ensure all backup strategies are implemented efficiently. We compare the average episode time across all IPC domains (over 1,000 trials). Since each IPC domain has 40 action-selection steps per trial, and 10,000 simulations are performed per selection step, these results present the cumulative time difference across 400,000 planning simulations.

These results show that, as expected, Monte Carlo is typically slightly faster than MaxMCTS(lambda), since it does not need to compute the max term every backpropagation step (Alg 2 Line 8). While MaxMCTS_gamma is more complex, by keeping track of nth-returns in an array, little extra computation is performed compared to MaxMCTS(lambda). The pseudocode in Algorithm 3 shows the efficient implementation of this approach. We can add these results to the paper, and believe that they will address reviewer 1's helpful suggestion.

Furthermore, we re-ran experiments using 12.5 seconds planning time per action selection step for a representative subset of IPC domains (instead of 10,000 planning simulations per action selection step), with results available here: http://i.imgur.com/fwO29IU.png. Only off-policy approaches are evaluated, and MaxMCTS(lambda=1) uses the slightly faster Monte Carlo backup. As these results demonstrate, the shape of the curves remains the same to those presented in Figure 1 in the submission, since there is relatively little difference in computation time between different backup strategies. It should be noted that the backpropagation time accounts for approximately 15-25% of time in the entire planning process, depending on the domain.

Regarding Reviewer 1 and 2's comments about suitability as a general purpose planner:
We agree that MaxMCTS(lambda) with suboptimal lambda values can perform poorly, and this fact limits the applicability of the algorithm as a general purpose planner (In a similar way, regular UCT requires tuning the C parameter for good empirical performance). For this reason, on line 542 we recommend (based on our results) trying just MaxMCTS_gamma and MaxMCTS(lambda=0) in situations where the lambda value cannot be optimized. We are currently exploring principled approaches to produce a single parameter-free approach to perform well in all types of domains. It is important to note that introducing any additional parameters (in an iterative or nested approach to remove lambda) defeats the purpose of removing the lambda parameter, and makes the algorithms more ad-hoc.

Regarding Reviewer 2's comments:
- We acknowledge that other action selection strategies apart from UCB1 (such as KL-UCB) exist. In practical settings, any tuned UCB1-like action selection strategy is sufficient for the analysis in this paper, since the focus of the paper is on backup strategies.
- In our experiments, C in UCB1 is tuned via a grid/bin search (Line 595), i.e. a lot of different C values were tried. Only the best performing UCT variants (i.e., at a particular values C) are plotted in Figure 3 to ensure clarity of the figure.

Regarding Reviewer 3's comments:
- getOrInitNode is the same as getNode, except it also initializes a node if one does not exist (when that action has not been taken before). This should have been explained better in the text.
- The uniform strategy presented on Line 199 performs slightly better than a purely random (1/K) strategy. These results were omitted from the paper due to space constraints.
- The experiments shown in Figure 3 demonstrate that results carry over to more informed action selection, i.e when UCT is used, in 2 domains. These experiments indicate that all backup strategies perform better when more informed action selection is used, but qualitatively, the relative effects of the backup strategies still apply.
- We will edit carefully for typos and consistent citation style prior to the submitting the next version.