skip to content

Department of Computer Science and Technology

Date: 
Tuesday, 26 November, 2024 - 14:00 to 15:00
Speaker: 
Nicola Cancedda -Meta's Fundamental AI Research (FAIR) team.
Venue: 
Computer Lab, LT1

Large Language Models (LLMs) reshaped the AI landscape and invited themselves as dinner-table topics, yet we do not really understand how they work. Propelled by this consideration, the field of AI interpretability is enjoying a revival.
In this talk I will introduce some fundamental interpretability concepts and discuss how insights from studying the internal activations of models led us to develop a training framework that significantly increases the robustness of LLMs to 'jailbreaking' attacks. I will also illustrate some explorations of the internal workings of transformer-based autoregressive LLMs that unexpectedly led to explaining 'attention sinking', a necessary mechanism for their proper functioning. I will finally offer my perspective on interesting future directions.

Nicola Cancedda is a researcher with Meta's Fundamental AI Research (FAIR) team. His current focus is on better understanding how Large Language Models realize complex behaviors to make them more capable, safer, and more efficient. He is an alumnus of the University of Rome "La Sapienza", and has held applied and fundamental research and management positions at Meta, Xerox, and Microsoft, pushing the state of the art in Machine Learning, Machine Translation, and Natural Language Processing, and leading the transfer of research results to large-scale production environment.

Seminar series: 
Cambridge ML Systems Seminar Series

Upcoming seminars