Review 1$
Thanks for the constructive comments. As in the original automatic statistician framework (Lloyd et al., 2014), our system generates an individual report for each time series selected from the multiple ones. That is, the "relational + individual" part of covariance kernel expression is mapped to the single time series. More precisely, when the best kernel is searched, the kernel expression is converted into a summation of multiplicative terms (of base kernels). In each section of a report, base kernels in each multiplicative term are explained. As an example, the CW(SE + CW(WN + SE, WN), C) will be explained as follows (when Z is the zero kernel):

- CW(SE,Z): A smooth function. This function applies until 06 Jun 2015 and from 16 Jul 2015 onwards.
- CW(Z,C): A constant. This function applies from 06 Jun 2015 until 16 Jul 2015.
- CW(CW(SE,Z),Z): A smooth function. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards.
- CW(CW(WN, Z),Z): Uncorrelated noise. This function applies until 06 Jun 2015, from 16 Jul 2015 until 26 Sep 2015 and from 09 Oct 2015 onwards.
- CW(CW(Z,WN),Z): Uncorrelated noise. This function applies from 26 Sep 2015 until 09 Oct 2015.
- CW(CW(Z,SE),Z): A very smooth function. This function applies from 26 Sep 2015 until 09 Oct 2015.

To provide details of the learned kernels, we will include individual reports  as supplementary data files.

Review 2

We agree that a more explicit title would explain our contributions better.

Adding a new kernel would make our model more expressive. Besides the spectral mixture (SM) kernel, we also developed some new kernels (e.g., a new change-window kernel where the kernel inside the windows will represent the shared structure in a specific period). Unfortunately, new kernels are not effective enough. That is, not all new kernels are selected during the search procedure. However, we are still actively seeking for new base kernels to model various changes such as exponential growth as the review suggested.

Regarding the number of parameters in the SRKL, we used a SM kernel with 10 components (3x10 parameters). Thus, the total number of parameters (3x10x"number of time series") is relatively larger than the number of parameters of the shared kernels as reflected to the high BIC measures of SKRL.

Our current model and automatically generated reports may not get immediate attention from the financial industry, now. However, we would like to emphasize that the automatic statistician system could impact the financial industry in a future by automatically analyzing hidden changes (such as long-term and short-term changes), and then generating human readable reports. Note that, an automatic statistician system on a single time series (e.g., Lloyd et al., 2014) may not be effective enough to analyze financial data when multiple (inter-correlated) time series data is given. In this paper, we show that exploiting shared structures could greatly improve the predictive performance of the automatic statistician system.

Thanks for the constructive comments and suggestions. We will address all the minor issues in a revised version.

Review 3

Thanks for the constructive comments.

We agree that the spectral mixture (SM) kernel degrades the interpretability of the results since the SM kernel is not easily explained in a human readable form. As discussed in Figure, 5, we observe that the SM kernel plays an important role in the early phase of the search procedure. However, as search gets deeper, majority of data are explained by the (interpretable) shared kernels and the SM kernel fits the residuals of individual time series (not yet explained by shared kernels).

We believe that our new finding (a way to improve the automatic statistician system) is non trivial. Note that, the original automatic statistician system (Lloyd et al., 2014) already outperformed existing methods including multiple kernel learning (e.g., Bach, Lanckriet, and Jordan, 2004), change point modeling (Garnett et al., 2010; Saatci, Turner, and Rasmussen, 2010; Fox and Dunson, 2013) and SM kernels (Wilson and Adams, 2013).

Due to the space limit, we could not provide the experimental result of RSME for individual time series. In the paper, we report the aggregated RMSE performance of 19 time series (9 stocks, 6 housing markets and 4 currency exchange). As the review suggests, we consider the RMSE performance of individual time series. Our model (SRKL) performs better (reduced RMSE) than the automatic statistician (CKL) in 16 time series out of 19 (84.2%). The average reduction of RMSE in 19 time series is 40.1%. The individual results are consistent with the aggregated results which are already included in the paper. To clarify, we will report individual RMSEs and error bars in a revised version.