Mengyu Yang

Machine Learning PhD Student at Georgia Tech

About Me

Hi! I’m a Machine Learning PhD student at Georgia Tech working with James Hays. I received my BASc in Engineering Science with Honours from the University of Toronto, where I specialized in Machine Intelligence. I’ve interned at Dolby during Summer 2025 working on multimodal representation learning with spatial audio and Google Research during Fall 2023 working on audio-visual sound source localization.

Here’s my CV.

Research Interests

I’m interested in multimodal learning grounded in computer vision, mainly working at the intersection of vision, audio, and language.

Topics include:

Representation learning
Video understanding
Audiovisual localization

Publications

Mengyu Yang, Yiming Chen, Haozheng Pei, Siddhant Agarwal, Arun Balajee Vasudevan, James Hays (2025). Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions. ICCV 2025.

Project

Mengyu Yang, Patrick Grady, Samarth Brahmbhatt, Arun Balajee Vasudevan, Charles C. Kemp, James Hays (2023). The Un-Kidnappable Robot: Acoustic Localization of Sneaking People. ICRA 2024.

PDF Project

Tanzila Rahman, Mengyu Yang, Leonid Sigal (2021). TriBERT: Human-centric Audio-visual Representation Learning. NeurIPS 2021.

PDF Code

Bryan Wang, Mengyu Yang, Tovi Grossman (2021). Soloist: Generating Mixed-Initiative Tutorials from Existing Guitar Instructional Videos Through Audio Processing. CHI 2021.

PDF Video

Mengyu Yang, David Rokeby, Xavier Snelgrove (2020). Mask-Guided Discovery of Semantic Manifolds in Generative Models. NeurIPS 2020 Creativity Workshop.

PDF Code Slides

Jason d’Eon, Sri Harsha Dumpala, Chandramouli Shama Sastry, Daniel Oore, Mengyu Yang, Sageev Oore (2020). Musical Speech: A Transformer-based Composition Tool. NeurIPS 2020 Demonstration Track.

Project

Projects

Building a Dataset for Music Analysis and Conditional Generation

My undergraduate thesis on deep learning for music generation and analysis.