We would first like to thank the reviewers for their positive support of our article and for their thorough and helpful remarks.$
We apologize for not linking the reference to the full version of the article, which we cannot for anonymity instructions. The article (40-page long) contains the proofs of all our results (which involve advanced random matrix considerations that wouldn't fit in the present submission) as well as more extensive simulations.
Regarding the main remarks on our results:
* On our choice to develop on the random matrix notions before delving into the subject, we feel that, without any notion of random matrix theory, the article would have looked like a series of unsupported claims. Also important is the reason why results take implicit forms on the onset which become explicit in some simple scenarios. This is at the core of random matrix methods. This strategy we opted for naturally reduces the amount of simulation trials shown, while also only scheming over the random matrix framework (in particular over the important classical notion of deterministic equivalents).
The simulation results shown are typical results (actually the most illustrative) we obtained in our simulation campaign. We took the Mackey-Glass example as common thread for coherency.
A full treatment (both with respect to theory and simulations) is accessible in the aforementioned full version of the article.
* Although our most generic results take implicit forms, they are nonetheless quite interpretable (in fact, they are common objects to random matrix specialists). More importantly, for the random matrices W of utmost interest, these results become explicit, simple, and quite telling.
* Our claim to bring quantitative results where past works are mostly qualitative is admittedly clumsy. A more precise statement is our providing for the first time a quantitative end-to-end performance analysis, in place of information-theoretic bounds or characterization of intermediate quantities (such as memory curves).
* The analysis of normal versus non-normal matrices is indeed not as profound as in the full version of the article (where the treatment is more substantial), thereby leaving an impression of haziness on this aspect of the article. In the full article, it is in particular shown that the memory curve decays at a much higher rate for non-normal versus normal matrices.
* We also apologize for not aknowledging as many recent works as are found in the growing ESN literature. We shall make those amends in the revised version.
* Finally, we do agree with Reviewer 2 that a change in title might be appropriate to stress our studying ESNs and not RNNs, which might be misleading.
We shall appropriately update our submission to meet those very legitimate remarks.
We would like to conclude by emphasizing, as Reviewers 3 and 2 appropriately pointed out, that this work is a first step toward bringing together random matrix theory (widely used today in wireless communications and signal processing, yet only making its first steps into machine learning) and neural network analysis. Many aspects are still naiv, and likely disrupting with the standard approach to neural network understanding, but we forcefully believe that there might be much to gain from this communion once it matures.
This, in essence, is the very contribution of the article.
Thank you again for your time and help in improving our contribution.
The authors.