Learning to Listen: ICML 2026 Workshop on Machine Learning for Audio
Abstract
Machine learning for audio has seen heightened interest over the past year, driven by audio language models and multimodal/foundation models for understanding and generating speech, music, and audio events, as well as rising demand for low-latency voice agents and real-time transcription. We propose the Machine Learning for Audio workshop at ICML 2026 to provide a dedicated forum for audio researchers and practitioners to exchange ideas, share tools and benchmarks, form collaborations, and engage in timely ethical discussion around generative audio and audio foundation models. The workshop will cover topics including generative synthesis, enhancement/denoising, datasets and augmentation, classification, transcription, source separation, and multimodal problems, and will solicit up to 4-page extended abstracts (~30 accepted), plus a poster/demo session for live presentations. The program will feature invited talks from leading academic and industry researchers spanning speech, music, and general audio ML. Additionally, the workshop organizers will release several refreshed audio datasets alongside the workshop, for use in contributed work.