We thank the reviewers for their valuable comments. We provide below an itemized response to the major issues raised in the reviews. $
Reviewer 4:

>> Previous work on non-negative Gaussian variables

Thanks for pointing us to the nonnegative Boltzmann machine (Downs, MacKay, and Lee, 1999) and the rectified Gaussian distribution (Socci, Lee, and Seung, 1997). 

We notice that these previous models impose non-negativity on observed variables and do not use latent variables. They are models for non-negative data. 

By contrast, the TGGM proposed here imposes non-negativity on latent variables.  The non-negative latent variables are integrated out to induce nonlinear relations among observed variables. The TGGM does not impose non-negativity on observed variables and hence is not limited to non-negative data.

While previous models induce multi-modality by relaxing definite positivity to co-positivity, the TGGM induces multi-modality through marginalization of the non-negative latent variables. The definite positivity seems necessary in the TGGM given that we are dealing with both non-negative variables and real variables. 

We will reference these two previous models and point out their relations to the TGGM. 


>> Computational Complexity and Scalability

The per-iteration complexity of the maximum-likelihood estimator (MLE) is O((ni+n)*m*N + m^2*N*T1) and that of back-propagation (BP) is O((ni+n)*m*N), where ni, m, and n are respectively the dimensions of x, h, and y, N is the mini-batch size, and T1 is the number of VB cycles.  

It is seen that MLE scales square to the number of latent variables in the worst case and so does BP assuming that ni+n is on the same order of magnitude as m. Moreover, MLE is roughly T1 times more expensive than BP.  Our experiments are based on T1 equal to 5 to 10. 

We will further investigate the scalability in our future work. 


Reviewer 5:

>> Generality of the TGGM and its performance guarantees 

The TGGM is a general framework for machine learning. We are focusing on regression and classification in this paper given the page limit. Other learning tasks and performance guarantees will be investigated in separate papers. 

>> Computational Complexity

Please see the response to Reviewer 4. 

>> The ratio \phi(a)/ \Phi(a)

Thank you for pointing out the relation to Gaussian hazard functions. We will employ it for fast computation in the future.  


Reviewer 7:

Thank you for the valuable suggestions regarding the future work. It would be exciting to investigate these interesting topics.