skip to content

Department of Computer Science and Technology

Date: 
Friday, 25 October, 2024 - 12:00 to 13:00
Speaker: 
Zebulon Youra Goriely (University of Cambridge)
Venue: 
Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09

The statistical properties of language and how they may be used in language processing and language acquisition have been studied for many decades. Recently, large language models have demonstrated striking language-learning capabilities, providing evidence for the “richness” of the linguistic stimulus, but are often trained on data that seems cognitively implausible both in terms of quantity (thousands of human-lifetimes) and quality (written text, internet sources). For these models to help us study language, we must think far more carefully about the plausibility of the input – using phonemes instead of letters, using spoken sources, and reducing the quantity. We must then determine whether the architectures we use are suitable at this scale and input representation. These models can then give us valuable analytical insights about the statistical properties of language and the learnability of language, as well as giving us practical benefits for tasks associated with language modelling and language understanding.

*Speaker Biography*

Zebulon Goriely is a fourth-year PhD student working on Transformer Language Models and Child Language Acquisition, supervised by Professor Paula Buttery.

Seminar series: 
NLIP Seminar Series

Upcoming seminars