Natural Experiments in NLP and Where to Find Them

Date:

Tuesday, 12 November, 2024 - 13:00 to 14:00

Speaker:

Pietro Lesci (University of Cambridge)

Venue:

Lecture Theatre 2, Computer Laboratory, William Gates Building

In training language models, training choices—such as the random seed for data ordering or the token vocabulary size—significantly influence model behaviour. Answering counterfactual questions like "How would the model perform if this instance were excluded from training?" is computationally expensive, as it requires re-training the model. Once these training configurations are set, they become fixed, creating a "natural experiment" where modifying the experimental conditions incurs high computational costs. Using econometric techniques to estimate causal effects from observational studies enables us to analyse the impact of these choices without requiring full experimental control or repeated model training. In this talk, I will present our paper, _Causal Estimation of Memorisation Profiles_ (Best Paper Award at ACL 2024), which introduces a novel method based on the difference-in-differences technique from econometrics to estimate memorisation without requiring model re-training.

"You can also join us on Zoom":https://cam-ac-uk.zoom.us/j/83400335522?pwd=LkjYvMOvVpMbabOV1MVTm8QU6DrGN7.1

Seminar series:

Artificial Intelligence Research Group Talks

View on talks.cam

Calendar

Upcoming seminars

17Jul

SolarFit: A Successor Refinement Approach for Sizing of PV and Storage Systems in EV-Enabled Homes

Julia Gschwind ETH Zurich, University of Cambridge

Energy and Environment Group
23Jul

Google DeepMind’s Gemini models and the Rise of Long-Context LLMs

Dr Nikolay Savinov (Google DeepMind)

Foundation AI
28Jul

Classical Commitments to Quantum States

Agi Vilanyi (MIT)

Quantum Computing Seminar
30Jul

Title to be confirmed

Stephen Xia, Northwestern University

Centre for Mobile, Wearable Systems and Augmented Intelligence Seminar Series
04Aug

Learning Under Constraints: From Federated Collaboration to Black-Box LLMs

Salma Kharrat, Kaust

Cambridge ML Systems Seminar Series

View all seminars

Upcoming seminars

About the department

Social media

Study at Cambridge

About the University

Research at Cambridge