skip to content

Department of Computer Science and Technology

Date: 
Friday, 6 February, 2026 - 16:00 to 17:00
Speaker: 
Jing Huang (Stanford University)
Venue: 
ONLY ONLY. Here is the Google Meet Link: https://meet.google.com/cru-hcuo-rhu

Memorization in LLMs has long been perceived as undesirable, associated with privacy risks, copyright concerns, and wasted capacity. In this talk, I argue for a complementary perspective: memorization is an intrinsic property of LLMs that can be leveraged to build a better LLM ecosystem. I first present two frameworks to rigorously study counterfactual memorization of a training run. I then demonstrate how memorization dynamics can be exploited to establish model and text provenance. Together, these results suggest a new perspective: rather than focusing on suppressing memorization, we should aim to understand and harness it. Doing so opens new avenues for provenance, tracing downstream impacts, and policies around intellectual property and integrity in the LLM ecosystem.

**Speaker Bio:** Jing Huang is a PhD candidate in the StanfordNLP Group, advised by Prof. Christopher Potts and Dr Diyi Yang. Jing's research interests focus on understanding what makes neural network models generalize well by studying the causal mechanisms that connect model behaviors, internal representations, and training data.

Seminar series: 
NLIP Seminar Series

Upcoming seminars