Paper ID: 1003 Title: Hierarchical Decision Making In Electricity Grid Management Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The problem the authors are concerned with is day-scale management of the electricity grid in the face of complex intra-day dynamics of the individual grid operators. In particular, the actions available to the grid management system are to decide which generators to turn on and off at the start of each day. Once this decision is made, individual grid node operators manage their nodes to satisfy demand in the face of fluctuating inputs (wind) and outputs (demand). Solving this problem exactly is intractable because of the nonlinearities of the power transfer dynamics and the large number of dimensions in the problem. Another complication is that the value function for the day-ahead (DA) policy is only available via the execution of the real-time (RT) policy which then incurs contingency costs based on power supply and demand. To address all of these problems the authors pose the goal as maximizing reward in a hierarchical MDP with two parts one for DA planning and one that models RT planning. By iteratively learning policies in both MDPs they are able to outperform previous, heuristic grid planning approaches. Clarity - Justification: The formulation as a two-level MDP is presented well. Significance - Justification: The experimental results do show improvement from using the algorithm over competing heuristics. However, from a reinforcement learning perspective the algorithm does not seem like a significant novel contribution; the approach uses the cross-entropy method for DA planning with a reinforcement learning subroutine for the RT value function estimation. The authors mention competing approaches that should have been compared against. The comparisons in Figure 9 are to heuristic policies that are not learned/optimized. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper clearly deals with an important problem and from the perspective of the application paper it seems worthy. The contribution from an ML perspective however is thin. It's not clear if a typical ICML attendee would find this work interesting. Presenting the model and algorithmic approach in a more general framework that captures general hierarchical MDPs would definitely make the paper stronger and more relevant to ICML. Planning in MDPs defined over multiple time-scales can definitely be challenging and could be very interesting. The energy management problem could be a nice motivation/benchmark. Overall it seems below bar as an RL paper, but might be interesting as an energy management (applied) paper. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduces a reinforcement learning method for managing power generation in an electric grid. Clarity - Justification: The paper is clear. Significance - Justification: The paper approaches an issue in electric grid management from a machine learning perspective -- where there are few such attempts. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is interesting, and applicable to a problem area that typically gets little attention from the machine learning community. The impact of renewable/green energy production makes power generation more difficult, and therefore more work in this area will enable cheaper green power deployment. The simulation environment is little explained (one of four new contributions), and text suggests what is presented is a tweaked version of an existing simulator. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): In this paper, the author address the problem of day-ahead planning of a power system. This is a rather difficult problem since the optimal day-ahead planning strategy depends on the (optimal) real-time strategy used. For solving this problem the authors present an approximate policy iteration scheme . This algorithm works as follow: for a given day-ahead (DA) policy, the value function of the optimal real-time strategy is computed using a TD(0) algorithm. This value function is afterwards used for carrying out a policy improvement step. The approach is tested on a rather simple power system and the results are encouraging. Clarity - Justification: The paper is very nicely written. Significance - Justification: This paper is rather interesting and well-written. I particularly like the problem formulation which can somehow be valuable to the power system community. Their algorithm seems to work well on their benchmark even if I am pretty sure that they will never be able to address the complexity of real-life power system, mostly because their algorithm will not scale well with the size fo action space. The authors should have a look at other works that have been done in the power system literature and that share strong similarities with their work. I would advise them for example to have a look at what is done in the GARPUR project where similar decomposition approaches are used. They also may want to have a look at what has been done in the GREDOR project where the optimization is carried over the whole decision chain (regulation strategies/investment strategies/day-ahead strategies/real-time control strategies) rather than only on the two last steps (see www.gredor.be ). That would help them to better position their work with respect to the existing literature. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): See above. =====