Algorithms for learning distributions over weight-vectors, such as AROW were recently shown empirically to achieve state-of-the-art performance at various problems, with strong theoretical guaranties. Extending these algorithms to matrix models poses a challenge since the number of free parameters in the covariance of the distribution scales as $n^4$ with the dimension $n$ of the matrix. We describe, analyze and experiment with two new algorithms for learning distribution of matrix models. Our first algorithm maintains a diagonal covariance over the parameters and is able to handle large covariance matrices. The second algorithm factores the covariance capturing some inter-features correlation while keeping the number of parameters linear in the size of the original matrix. We analyze the diagonal algorithm in the mistake bound model and show the superior precision of our approach over other algorithms in two tasks: retrieving similar images, and ranking similar documents. The second algorithms is shown to attain faster convergence rate.