Event Actions
Abstract: Principal components analysis (PCA) is a workhorse method for uncovering latent low-rank signals in noisy high-dimensional data and is used throughout signal processing, machine learning, and data science. But what happens when the data are heterogeneous as is common in modern datasets? This talk presents recent progress on understanding (and improving) PCA for settings with heterogeneous noise, namely, settings with heterogeneous quality where some samples are noisier than others. Such heterogeneity is frequently present in modern datasets coming from genomics, medical imaging, astronomy, and RADAR, to name just a few. A natural and common approach to handling the heterogeneous quality is to use a weighted variant of PCA that gives noisier samples less weight. Here we uncover a surprising discovery: the standard choice of inverse noise variance weighting is in fact suboptimal! Using techniques from random matrix theory and variational analysis, we derive optimal weights in the large-dimensional limit. The weights depend not only on the noise variances but also on the signal variances, and we conclude with a discussion of techniques for estimating these directly from data.
Bio: David Hong is an Assistant Professor in the Department of Electrical and Computer Engineering at the University of Delaware, where he is also a Resident Faculty of the Data Science Institute. Previously, he was an NSF Postdoctoral Research Fellow in the Department of Statistics and Data Science at the University of Pennsylvania. He completed his PhD in the Department of Electrical Engineering and Computer Science at the University of Michigan, where he was an NSF Graduate Research Fellow. He also spent a summer as a Data Science Graduate Intern at Sandia National Labs.