The tradeoff governing efficient language model architectures

Date:

Friday, 14 June, 2024 - 16:00 to 17:00

Speaker:

Sabri Eyuboglu, Stanford University

Venue:

Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09

Recent work has proposed alternative language model architectures (e.g. RWKV, Mamba, Hyena) that are dramatically faster than Attention (e.g. 25x higher throughput). However, it’s unclear how switching to these new architectures might affect the behavior of language models when scaled up. In this talk, we’ll discuss our recent work studying the fundamental tradeoffs that govern autoregressive language models. In particular, we’ll focus on language model recall, the ability to ground generations on information seen in-context, which is critical for in-context learning and copying. We show with theory and experiments that all autoregressive architectures obey a fundamental tradeoff: the less memory the model consumes during inference, the worse it is at recall. This tradeoff matters because memory consumption dictates language model throughput in practice. We propose a simple architecture called Based that combines linear and sliding window attention. By varying Based window size and linear attention feature dimension, we can dial the model’s memory consumption and traverse the Pareto frontier of the recall-memory tradeoff curve, recovering the full quality of attention on one end and the efficiency of the fastest attention alternatives on the other.

Bio:

I'm a Fourth-Year CS PhD Student in the Stanford Machine Learning Group advised by Chris Ré and James Zou. I am supported by the National Science Foundation GRFP .
I like to develop a detailed understanding of how machine learning models work and when they fail by exploring the unstructured data on which they are trained and formalizing sub-tasks with synthetics. Most recently, I've been working on understanding how neural network building blocks affect the quality and efficiency of foundation models. I also like to build tools that leverage large, pre-trained models to facilitate the analysis and management of unstructured training and validation datasets. I'm motivated by challenges that arise when trying to apply machine learning in safety-critical settings like medicine and the sciences. Previously, I was a machine learning research intern at Flatiron Health. I completed my undergrad and master's at Stanford, where I worked with Jure Leskovec's SNAP Group and the AIMI Center.

Seminar series:

NLIP Seminar Series

View on talks.cam

Calendar

Upcoming seminars

15Aug

Recursive Definitions in Lean

Joachim Breitner, Lean FRO

Computer Architecture Group Meeting
15Aug

Recursive Definitions in Lean

Joachim Breitner, Lean FRO

Computer Architecture Group Meeting
20Aug

Title to be confirmed

Speaker to be confirmed

Foundation AI
27Aug

sheaf and tomato

Stefano Fiorini

Foundation AI
27Aug

sheaf and tomato

Stefano Fiorini

Foundation AI

View all seminars

Upcoming seminars

About the department

Social media

Study at Cambridge

About the University

Research at Cambridge