skip to content

Department of Computer Science and Technology

Date: 
Friday, 22 May, 2026 - 12:00 to 13:00
Speaker: 
Israel Mason-Williams (Imperial/KCL)
Venue: 
FW26 Hybrid (In-Person + Online). Here is the Google Meet Link: https://meet.google.com/cru-hcuo-rhu

Abstract: Neural networks have shown remarkable performance across data domains, especially in regimes of increasing compute budgets. However, fundamental insights into how neural networks process information, share representations and traverse loss landscapes remain uncertain. In this work, we quantify the functional impact of distribution matching, facilitated by knowledge sharing mechanisms such as knowledge distillation, under student-teacher optimisation strategies. Our empirical evaluation across modalities, architectures and extensive hyperparameter settings shows that the functional impact of distribution matching is far more nuanced than current literature would suggest. We unveil a fundamental property of negative asymmetric transfer, which underpins logit matching optimisation, calling for a reappraisal of logit matching as primarily a form of regularisation rather than a beneficial or consistent knowledge transfer mechanism. Following this, we explore geometric properties of neural networks and how regularisation strategies modify internal representations of models and minima found at the end of training. From our function-centric lens, we provide empirical evidence from synthetic tasks to high-dimensional datasets that minima geometry represents decision boundary structure and that generalised preferences for flat minima need to be reconsidered. As a result, we can decouple the relationship between minima geometry, generalisation, and memorisation to understand how different inductive biases and regularisation strategies improve performance on different data distributions.

Seminar series: 
NLIP Seminar Series

Upcoming seminars