Poster
Optimal Transfer Learning for Missing Not-at-Random Matrix Completion
Akhil Jalan · Yassir Jedra · Arya Mazumdar · Soumendu Sundar Mukherjee · Purnamrita Sarkar
West Exhibition Hall B2-B3 #W-815
The problem that prompted our research is matrix completion, which takes a noisy and incomplete matrix as input and requires a full matrix as output. Matrix completion arises in many application areas; our motivation comes from missing data in biological settings such as gene sequencing, metabolic network construction, and companion diagnostics. In these settings, entire rows and columns of the data matrix can be missing, making traditional matrix completion algorithms ineffective. We formulate this specific kind of matrix completion problem as a transfer learning problem, in which we have access to a source matrix P as well as target matrix Q. The matrix Q typically has more observational noise and missing entries, such as when Q corresponds to the metabolic network of a rarely studied species while P is that of a more common species. We then present optimal estimation algorithms for Q in two settings: active sampling (where we have to decide what entries of Q to observe) and passive sampling (where we simply have some pre-observed Q). The algorithms are optimal in the sense of achieving the best possible estimation error, given a certain amount of data and underlying data distribution. This research matters because it can directly contribute to applied studies in biostatistics and bioinformatics, as we demonstrate in our own experiments. Additionally, we make progress on transfer learning, which is an important area of machine learning in its own right.
Live content is unavailable. Log in and register to view live content