Multimodal representation learning via maximization of local mutual information

Ruizhi (Ray) Liao
Postdoctoral Associate, MIT Computer Science & Artificial Intelligence Lab

SENSE.nano 2021
Monday, October 25
Session 2: Physiological Monitoring
3:35 PM – 3:50 PM EDT

Abstract
Liao proposes and demonstrates a representation learning approach by maximizing the mutual information between local features of images and text. The goal of this approach is to learn useful image representations by taking advantage of the rich information contained in the free text that describes the findings in the image. Liao’s method trains image and text encoders by encouraging the resulting representations to exhibit high local mutual information. He makes use of recent advances in mutual information estimation with neural network discriminators. Liao argues that the sum of local mutual information is typically a lower bound on the global mutual information. His experimental results in the downstream image classification tasks demonstrate the advantages of using local features for image-text representation learning.

Biography
Liao earned his computer science PhD at MIT in Aug 2021, advised by Prof Polina Golland. He studies machine learning and develop computational tools driven by clinical problems. Liao is excited about ubiquitous computing and its potential to advance health care. His PhD research has been supported by Merrill Lynch Fellowship and Siebel Fellowship.