Text-and-audio methods

Date:

Tuesday, 30 January, 2024 - 13:00 to 14:00

Speaker:

Catalina Cangea (Google DeepMind)

Venue:

Lecture Theatre 2, Computer Laboratory, William Gates Building

This talk supports the R255 Advanced Topics in Machine Learning course module on Multimodal Learning and provides a bird’s eye view of the rapidly evolving text-audio landscape, with a focus on music as a primary example of audio data. I will first present types of tasks that exist in this space, then discuss data curation challenges and follow with an overview of some existing retrieval and generation methods, including a quick primer on diffusion models. Finally, I will describe current evaluation metrics and their limitations.

"You can also join us on Zoom":https://cam-ac-uk.zoom.us/j/92041617729

Seminar series:

Artificial Intelligence Research Group Talks

View on talks.cam

Calendar

Upcoming seminars

09Oct

The Road to Formalising 8-Dimensional Sphere Packing in Lean

Sidharth Hariharan (Carnegie Mellon University)

Formalisation of mathematics with interactive theorem provers
10Oct

NLIP 2025 Social: Meet New PhD Students

Speaker to be confirmed

NLIP Seminar Series
10Oct

Evaluating Baseline and Forecasting Success: Making REDD+ More Credible

E-Ping Rau, University of Cambridge

Energy and Environment Group
10Oct

Semiring Semantics: Algebraic Properties vs. Logical Results

Sophie Brinke (RWTH Aachen)

Logic and Semantics Seminar
13Oct

Perplexity AI: Under the Hood of LLM Inference

Nandor Licker

Technical Talks

View all seminars

Upcoming seminars

About the department

Social media

Study at Cambridge

About the University

Research at Cambridge