skip to content

Department of Computer Science and Technology

Date: 
Tuesday, 30 January, 2024 - 13:00 to 14:00
Speaker: 
Catalina Cangea (Google DeepMind)
Venue: 
Lecture Theatre 2, Computer Laboratory, William Gates Building

This talk supports the R255 Advanced Topics in Machine Learning course module on Multimodal Learning and provides a bird’s eye view of the rapidly evolving text-audio landscape, with a focus on music as a primary example of audio data. I will first present types of tasks that exist in this space, then discuss data curation challenges and follow with an overview of some existing retrieval and generation methods, including a quick primer on diffusion models. Finally, I will describe current evaluation metrics and their limitations.

"You can also join us on Zoom":https://cam-ac-uk.zoom.us/j/92041617729

Seminar series: 
Artificial Intelligence Research Group Talks

Upcoming seminars