We thank the reviewers for their thorough analysis. We plan to follow your suggestions on how to improve clarity and presentation.$
R1 and R3 ask about the optimality of the metric used in the RL experiment. Our objective is not to propose an optimal metric for use in RL. We have shown just one possible choice, which is simple yet informative enough to result in better ascent directions. The question of how to select optimal metrics for specific applications (such as reinforcement learning) is an interesting but separate question. Our goal was, by contrast, to introduce a concrete definition of a more informed update direction (a new natural gradient) and to empirically demonstrate that the resulting update directions are more efficient than previous natural gradients, when a reasonable (even if sub-optimal) metric is used.

R1 and R3 suggest adding a learning curve. In our experiments we focused on showing that our ascent direction is more efficient than previous natural gradient directions when used for a single update step---even when we control for confounding factors such as step sizes, which also affect performance. We will extend this analysis and add a learning curve showing that the repeated use of energetic ascent directions results in convergence in fewer steps. We will also include pseudocode describing the specific instantiation of our energetic gradient idea to the RL setting.

We thank R4 for his/her suggestions on how to improve the discussion on Section 5 and about Figure 2. We will incorporate these ideas into the paper, as well as your other comments on how to improve clarity.
                  
R3 asks about the use of a line search in the experiments. The question of what step size to use is orthogonal to the question of which update directions to use. By carefully selecting step sizes, it may be possible to show that one method outperforms the other, even if it is generally inferior. To avoid unfair comparisons of the quality of Fisher and Energetic ascent directions, we allow each method to use its optimal step size (computed via a line search).

R3 asks why one would not use (exact or approximate) Newton’s method. We apologize for not making this clearer---we will add a direct discussion of this topic in the paper. Our goal is not, in general, to estimate the Hessian with the EIM. If the Hessian is known or easily approximated, then one should absolutely use Newton’s method rather than a natural gradient method (Fisher or Energetic). Natural gradient methods are useful when f (the objective function) is not known, but p (the parametrization of the distribution) is known, since they correct for the parametrization of the distribution, regardless of what f is. Although we discuss the similarity of the EIM to the Hessian in some settings, we do not propose that the EIM is an estimate of the Hessian in general. Even though the EIM might produce a worse update direction than Newton’s method, it is significantly easier to estimate given prior knowledge about p (but not f).

R3 says that the example of Section 5 is unfair, since a Hessian is involved. As you describe, this is an example where the EIM works perfectly (it is the Hessian). This is meant to demonstrate that the Energetic natural gradient can correct for more than the Fisher natural gradient (its main competitor when f is not known). In other words, EIM provides a mechanism for leveraging knowledge about p when f is not known. See comment below for how we plan to clarify this.

Comments by R1 and R3 suggest that the paper would benefit from a longer discussion on the relationship between natural gradient methods (Fisher and Energetic) and Newton’s method. We will include an example similar to that in Section 5, but where f is a more sophisticated function of p, so that the EIM does not represent the Hessian; this will more clearly show a situation where FIM < EIM < Hessian (with the latter not being known from prior information). In particular, we will clarify that if the Hessian is known, then one should use Newton’s method. However, if the Hessian is not known and one decides to use a natural gradient method, then it may be advantageous to use the Energetic natural gradient instead of the standard Fisher natural gradient when a reasonable (even if sub-optimal) metric is available.

Finally, we address the list of direct questions made by R1:
1. We will provide a learning curve in addition to the one-step analysis we provided.
2. We will add pseudocode. We compute the sample gradient using REINFORCE and also the sample EIM and FIM.
3. We will add a discussion of this in general (not just for RL). Parameters with less influence on the PPM should be emphasized since they may have to undergo large changes to significantly change the distribution.
4. So far we have only experimented with Mountain Car.
5. It makes the proof simpler. We will look into whether we can provide proofs for the infinite setting.