Interpreting and Controlling Intermediate Representations in Large Language Models

Date:

Tuesday, 26 November, 2024 - 14:00 to 15:00

Speaker:

Nicola Cancedda -Meta's Fundamental AI Research (FAIR) team.

Venue:

Computer Lab, LT1

Large Language Models (LLMs) reshaped the AI landscape and invited themselves as dinner-table topics, yet we do not really understand how they work. Propelled by this consideration, the field of AI interpretability is enjoying a revival.
In this talk I will introduce some fundamental interpretability concepts and discuss how insights from studying the internal activations of models led us to develop a training framework that significantly increases the robustness of LLMs to 'jailbreaking' attacks. I will also illustrate some explorations of the internal workings of transformer-based autoregressive LLMs that unexpectedly led to explaining 'attention sinking', a necessary mechanism for their proper functioning. I will finally offer my perspective on interesting future directions.

Nicola Cancedda is a researcher with Meta's Fundamental AI Research (FAIR) team. His current focus is on better understanding how Large Language Models realize complex behaviors to make them more capable, safer, and more efficient. He is an alumnus of the University of Rome "La Sapienza", and has held applied and fundamental research and management positions at Meta, Xerox, and Microsoft, pushing the state of the art in Machine Learning, Machine Translation, and Natural Language Processing, and leading the transfer of research results to large-scale production environment.

Seminar series:

Cambridge ML Systems Seminar Series

View on talks.cam

Calendar

Upcoming seminars

17Oct

Making and breaking tokenizers

Sander Land (Writer)

NLIP Seminar Series
17Oct

Using AI to Code Downstream Tasks for a Remote Sensing Foundation Model

Srinivasan Keshav, University of Cambridge

Energy and Environment Group
17Oct

The Dichotomy Theorem on the computational complexity of the Constraint Satisfaction Problem

Petar Markovic (University of Novi Sad)

Logic and Semantics Seminar
20Oct

Federated Learning at H.IAAC: On-going Research and Opportunities

Allan M. de Souza & Luiz Bittencourt, Universidade Estadual de Campinas (UNICAMP), Brazil

Cambridge ML Systems Seminar Series
20Oct

Bloomberg: Observability in Action: Designing Effective Dashboards

Speaker to be confirmed

Technical Talks

View all seminars

Upcoming seminars

About the department

Social media

Study at Cambridge

About the University

Research at Cambridge