Mengyu Yang

Machine Learning PhD Student at Georgia Tech

About Me

Hi! I’m a Machine Learning PhD student at Georgia Tech working with James Hays. I received my BASc in Engineering Science with Honours from the University of Toronto, where I specialized in Machine Intelligence. I’ve previously interned at Google Research during Fall 2023 working on audio-visual sound source localization.

I’ll be interning at Dolby during Summer 2025!

Here’s my CV.

Research Interests

My interests lie at the intersection of computer vision and machine learning. My goal is to build models capable of understanding the visual world through multi-modal data.

Topics include:

Multi-modal learning (most recently in audiovisual learning)
Representation learning
Video understanding

Publications

Mengyu Yang, Yiming Chen, Haozheng Pei, Siddhant Agarwal, Arun Balajee Vasudevan, James Hays (2025). Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions. ICCV 2025 (Coming soon!).

Mengyu Yang, Patrick Grady, Samarth Brahmbhatt, Arun Balajee Vasudevan, Charles C. Kemp, James Hays (2023). The Un-Kidnappable Robot: Acoustic Localization of Sneaking People. ICRA 2024.

PDF Project

Tanzila Rahman, Mengyu Yang, Leonid Sigal (2021). TriBERT: Human-centric Audio-visual Representation Learning. NeurIPS 2021.

PDF Code

Bryan Wang, Mengyu Yang, Tovi Grossman (2021). Soloist: Generating Mixed-Initiative Tutorials from Existing Guitar Instructional Videos Through Audio Processing. CHI 2021.

PDF Video

Mengyu Yang, David Rokeby, Xavier Snelgrove (2020). Mask-Guided Discovery of Semantic Manifolds in Generative Models. NeurIPS 2020 Creativity Workshop.

PDF Code Slides

Jason d’Eon, Sri Harsha Dumpala, Chandramouli Shama Sastry, Daniel Oore, Mengyu Yang, Sageev Oore (2020). Musical Speech: A Transformer-based Composition Tool. NeurIPS 2020 Demonstration Track.

Project

Projects

Building a Dataset for Music Analysis and Conditional Generation

My undergraduate thesis on deep learning for music generation and analysis.