We would like to thank the reviewers for the time spent reading our work and offering constructive criticism. The importance of modelling correlations among multiple objectives seems to have been well appreciated, along with our theoretically sound Bayesian formulation and approximation. Below we address each of your concerns in turn.$

R2:

In this paper, we experiment with L=2 and L=3 objectives and stop at n=100 samples, because for all experiments it was clear which methods performed best after this many samples. We can state this in the experiments section.

Thank you for the pointer to the rank-based approach to MOO. We have recently implemented a benchmark evolutionary MOO algorithm, NSGA-II, which our CEIPV method outperforms on each task considered by statistically significant margins – we will include these plots in a camera ready version.

We did not consider the log-Laplacian approximation but believe that the form of the approximation would be very similar to the one we propose in this work, as the integrand in the expectation would be approximated with a Gaussian, which is essentially what our method does.

The rocket function we consider does indeed have a non-convex Pareto frontier because one objective has a strong discontinuity. It is possible that other functions we consider also have non-convex Pareto frontiers, but we did not exhaustively find the true frontiers for the expensive experiments. The fact that our method does not struggle with non-convex frontiers whilst linearization methods do, is something we will emphasize in the discussion.


R3:

Thank you for referring us to the Feliot et al. (2015) paper, which does refer to a lot of work in the MOO space. In Remark 1, the authors say, “A variety of alternative approaches have been proposed to extend the EI criterion to the multi-objective case… these approaches are heuristic extensions of the EI criterion, in the sense that none of them emerges from a proper Bayesian formulation”. We believe that our approach is one of the few that results from a theoretically sound Bayesian framework, and will change our claim to reflect this idea.

You are correct that we must be careful when using EI in the noisy setting. Whilst we do not use Augmented EI as suggested by Picheny et al (2012), we use a method that they suggest in the beginning of section 3.3, namely, augmenting y_{max} to the value of the function mean given by the correlated GP model. This will be made clearer in our experiments section.

We certainly plan to publicise our code on acceptance of the paper.


R4:

While the extension we propose may not be drastic, it is a very important one – in real applications,  objectives are often naturally negatively correlated, an idea completely overlooked by all other multi-objective optimization methods known to us. We feel it is important to acknowledge the significance of our contribution in being able to model and use these dependencies for optimization.

You are correct that we do use a lot of space to describe the problem, introduce notation and motivate the idea of hypervolume. This is an issue with having a fairly tight page allowance when there is a lot to explain. Our view was that it was crucial to be thorough with notation and definitions in order for readers to understand the quality measure and acquisition function. Based on your suggestions, in a final version we would shorten the Introduction and move the synthetic function definitions to the supplementary material. This will leave more space to add more explanations on the quality of the approximation, its relation to EP and why it is accurate enough for our task. We will add more text about the correlated GP priors.

We have recently actually tested NSGA-II on our tasks and CEIPV seems to consistently outperform it. PESM is very new, and was put online far into our own experiments.

Vlmop3 and rocket functions have 3 objectives, not 2. Our method has been implemented for any integer number of objectives greater than 1.

The relative scales of f_1, …, f_L affect the hypervolume measure. In practice, after the initial 5 function evaluations at random input locations, we make affine transformations to each f_l such that the minimum and maximum initial observations are 0 and 1 respectively. This gives each objective an equal scale in expectation.

The reason for the neural network application, we combined memory and training time to create an objective because each of these terms were actually inputs to the function, making it strange to have them each be an objective also. We therefore decided to create an objective which combined the two.

The derivative of the acquisition function was used for gradient based optimization, needed to decide where to evaluate the objectives next. We will add a sentence in the experiment section to make this explicit.