We thank the reviewers for their comments about our paper. See specific responses below.$
R1:
The product in (4) takes value 0 when X* is not a valid Pareto set, i.e., there are some other points in X that are "better" than those in X*. The factor between brackets in (4) takes value 0 if x' has any objective that is better than the corresponding objective in x*.
We will cite Feliot et al. However, according to our experiments PESMO gives better results than EHI, which is computed exactly. The approximation of Feliot et al., although faster than EHI, is expected to give worse results.
All our code will be made available.
R3:
While we have not developed theory for our method, entropy reduction can elegantly tackle multi-objective problems with improved results over existing methods. Our method is the only one that allows for a decoupled evaluation, with practical advantages, as shown by our experiments. We agree that theoretical results are important but they are, in general, difficult to obtain. On the other hand, practitioners are currently using related methods (also lacking theory) on a daily basis. If we can provide a significantly better and more general method, then we think this is very worthwhile. We hope that the development of useful methods will justify the development of better theory over time; this would be preferable to theory being developed for the less effective methods that exist today.
We will improve Section 2.1 as indicated.
We run EP until convergence to approximate the factors that do not depend on the candidate point x. There are (|X|-1)*|X*| of these factors. The factors that are only refined once are the ones that depend on x. There are |X*| of these factors. Preliminary experiments in which we also refine the latter factors several times do not show significant improvements. In the paper and the supplementary we compare our EP approximation of the acquisition with a more accurate estimate via Monte-Carlo sampling. We show that the EP approximation is accurate in terms of the location of the global maximizer.
R4:
The improvements obtained are typically bigger than one standard deviation. When the number of objectives is large (see the right column of Fig. 2) the improvements of PESMO_dec are very big. It achieves similar results to the other methods with only half of the evaluations. See also the first row of Table 2.
R5:
We will correct Eq. (4).
A decoupled acquisition means having a different acquisition per objective. These acquisition functions can identify, at each iteration, the objective that is expected to be more useful to evaluate, and on which location. It does not mean optimizing each objective separately.
In PESMO_dec some objectives (the most difficult ones) are evaluated more often. See Fig. 3 and in Fig. 5. This is a practical advantage of our method. The other methods always evaluate all objectives at the same input location, at each iteration.
R6:
The multi-objective extension is not trivial. It is not straight-forward what factors to use for p(X*|f) and it is not clear if EP can actually be used to approximate them. Unlike the single objective case, the total number of factors is huge. Namely, |X| * |X*|, which is expected to lead to an un-bearable cost of O(K(|X|*|X*|)^3). The key point is that the number of latent variables (f_1(x_1), f_2(x_1), etc.) is smaller and equal to |X|, where X = {x_i}_0^n U {x*_j}_0^m U {x}. This gives a tractable cost of O(K|X|^3). Furthermore, in the multi-objective case we also need to work with a set of potentially infinite size (the Pareto set), while in the single objective case we work with just a point (the global minimum). Working with a set is more difficult.
We believe our paper is an important contribution to the field. We show state-of-the-art results for multi-objective optimization. Furthermore, our approach allows for a decoupled evaluation setting, unlike any existing method.
A decoupled evaluation has practical advantages. It provides big gains when the number of objectives is large. Fig. 2 (right column) shows that it gives similar results to the other methods with only half of the evaluations. See also the first row of Table 2.
The advantage of PESMO w.r.t SMSego is that (i) it allows for a decoupled evaluation while SMSego does not, and (ii) it gives better results, as shown by our experiments. PESMO may obtain better results in the coupled setting because it follows a less greedy approach than SMSego, which chooses the next point as the one that is expected to improve the hyper-volume the most. This may limit the exploration of the space of potential solutions in SMSego.
Table 1 shows that SMSego is on average 17 seconds faster than PESMO in finding the next point on which to evaluate the objectives. If the evaluation of each objective is very expensive, e.g., it takes 1 hour, that difference is negligible.