Mengyu Yang

Mengyu Yang

Machine Learning PhD Student at Georgia Tech

About Me

Hi! I’m a Machine Learning PhD student at Georgia Tech working with James Hays. I received my BASc in Engineering Science with Honours from the University of Toronto, where I specialized in Machine Intelligence. I’ve interned at Dolby during Summer 2025 working on multimodal representation learning for spatial audio and Google Research during Fall 2023 working on audio-visual sound source localization.

I’m excited to intern at Adobe in Summer 2026!

Here’s my CV.

Research Interests

I’m interested in multimodal learning grounded in computer vision, mainly working at the intersection of vision, audio, and language.

Topics include:

  • Representation learning
  • Video understanding
  • Audiovisual localization

Publications

(2025). Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions. ICCV 2025.

Project

(2023). The Un-Kidnappable Robot: Acoustic Localization of Sneaking People. ICRA 2024.

PDF Project

(2021). TriBERT: Human-centric Audio-visual Representation Learning. NeurIPS 2021.

PDF Code

(2020). Mask-Guided Discovery of Semantic Manifolds in Generative Models. NeurIPS 2020 Creativity Workshop.

PDF Code Slides

(2020). Musical Speech: A Transformer-based Composition Tool. NeurIPS 2020 Demonstration Track.

Project