Mengyu Yang

Mengyu Yang

Machine Learning PhD Student at Georgia Tech

About Me

Hi! I’m a Machine Learning PhD student at Georgia Tech working with James Hays. I received my BASc in Engineering Science with Honours from the University of Toronto, where I specialized in Machine Intelligence. I’ve interned at Dolby during Summer 2025 working on multimodal representation learning with spatial audio and Google Research during Fall 2023 working on audio-visual sound source localization.

Here’s my CV.

Research Interests

I’m interested in multimodal learning grounded in computer vision, mainly working at the intersection of vision, audio, and language.

Topics include:

  • Representation learning
  • Video understanding
  • Audiovisual localization

Publications

(2025). Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions. ICCV 2025.

Project

(2023). The Un-Kidnappable Robot: Acoustic Localization of Sneaking People. ICRA 2024.

PDF Project

(2021). TriBERT: Human-centric Audio-visual Representation Learning. NeurIPS 2021.

PDF Code

(2020). Mask-Guided Discovery of Semantic Manifolds in Generative Models. NeurIPS 2020 Creativity Workshop.

PDF Code Slides

(2020). Musical Speech: A Transformer-based Composition Tool. NeurIPS 2020 Demonstration Track.

Project