Dear authors, thank you for your response and feedback. We are satisfied that the reviewers found this work interesting and understand the importance and impact of energy and sustainability. We believe that ML in general and RL in particular are promising directions for electrical systems management and planning, and should play an increasing role as more renewable generations are introduced to the power grid in the coming years. We agree with reviewer 1, that this application received little attention from the machine learning community and hope our work will help interest other researchers. $ Reviewer 1 mentioned the simulation environment, which is a substantial segment of this work. This is an important comment, as we indeed thoroughly discussed on the level of attention it should receive in the paper. Due to length limitation, we decided to give details on it online, and share the code for further research and expansion. Our simulation uses the commonly used Matpower for the sole purpose of solving the power flows, many other solvers can be used. Reviewer 3 discusses the relevance of the paper to the ML and RL community. We see this work as highly relevant for the following reasons. First, it brings an important real-world problem to RL, which is thirsty for well-formalized large-scale applications rather than the common toy benchmarks. Second, the model we formulated was carefully designed with feedback from power system operators as a part of a major effort. Our model is different than the common hierarchical models as it describes two *separate* decision makers that run on *different* state-spaces and temporal resolutions. It can be utilized for other real-world problems with a hierarchical structure, where it is difficult to assess the implications of slow time-scale actions on fast time-scale rewards. Third, although our method does utilize common tools from RL and optimization, the interleaved aspect of improving a policy in one MDP based on the value estimation of the other is, to our knowledge, novel. Our experimental results show comparisons to heuristics in the industry commonly used by systems operators today. The complexity of the problem renders current learning algorithms impractical. Our hope is that our work will present a benchmark for other researchers to apply more advance methods. Reviewer 4 mentioned GARPUR and GREDOR projects. These are two projects that we are well aware of and are indeed very relevant to this work. We will add the needed references to the paper. The reviewer also discusses the scalability issue. As explained by the reviewer, the immense complexity of real-life power systems makes scalability an important and difficult issue. We put much effort into tackling it in ongoing and future work. We believe that ML and RL methods are a promising direction for managing the tractability of long-term planning horizons, and on their own they encompass scalability challenges.