skip to content

Department of Computer Science and Technology

Date: 
Friday, 15 May, 2026 - 12:00 to 13:00
Speaker: 
Arduin Findeis (University of Cambridge)
Venue: 
FW26 Hybrid (In-Person + Online). Here is the Google Meet Link: https://meet.google.com/cru-hcuo-rhu

AI evolves rapidly: top models are superseded by the next generation within months, if not weeks. Benchmarks see similarly rapid turnover. Once released, many benchmarks become "solved" within months or at most a few years, no longer able to measure the frontier of AI. New benchmarks quickly take their place. Yet, even though benchmarks and models change so constantly, some fundamental issues of evaluation remain surprisingly enduring. In this talk, I will draw on my experience evaluating numerous AI systems over the last seven years to identify persistent challenges of evaluation. I will discuss why attempts to solve these open problems have largely fallen short and what we can learn from them. Finally, I will outline my perspective on the most important future directions for evaluation research and why I consider advancing evaluation vital to make future AI systems both useful and safe.


Speaker Bio: Arduin Findeis is a PhD candidate in the Department of Computer Science & Technology, supervised by Prof Robert Mullins, working on the evaluation of AI systems: which systems are “better” or “worse” for human outcomes.

Seminar series: 
NLIP Seminar Series

Upcoming seminars