skip to content

Department of Computer Science and Technology

Date: 
Thursday, 1 May, 2025 - 15:00 to 16:00
Speaker: 
Smita Vijayakumar, Systems Research Group, Cambridge University Computer Laboratory
Venue: 
Computer Lab, FW11 and Online (MS Teams link below)

"Join on MS Teams":https://teams.microsoft.com/l/meetup-join/19%3ameeting_NGI3NThhZGMtNDFlNS00ZTJhLWJlYWUtYzAyYWIzZGMwODY4%40thread.v2/0?context=%7b%22Tid%22%3a%2249a50445-bdfa-4b79-ade3-547b4f3986e9%22%2c%22Oid%22%3a%22c74ff4ca-98fe-4b28-9889-e119acc12f30%22%7d

The growing demand for data centre resources and the slower evolution of their hardware have led to clusters operating at high utilisation. In this talk, I will examine how current schedulers perform under such conditions. I will discuss how centralised schedulers struggle to scale under high load due to the significant network traffic caused by continuously transferring up-to-date node data. Conversely, distributed schedulers scale well but lack a global cluster view, leading to suboptimal task allocations. As a result, existing schedulers impose up to three times longer wait times on tail tasks, which increases job completion times.

I will then introduce our work on decentralised scheduling, focusing on performance, scalability, and load balancing. These schedulers have been largely under-explored due to their design complexity. However, we demonstrate that Murmuration, our job-aware decentralised scheduler, achieves high performance under both normal and high load despite its simple approach using approximate load information. It reduces communication overhead between nodes and schedulers while still achieving balanced cluster load distribution. By the end of this talk, I hope to convince you that decentralised schedulers with approximate knowledge strike the right balance between performance and scalability, making them a practical solution for today’s highly utilised data centres.

Bio: Smita Vijayakumar recently completed her PhD from the Department of Computer Science and Technology at the University of Cambridge, under the supervision of Evangelia Kalyvianaki. As a part of her thesis, she developed a novel decentralised scheduling framework to reduce tail task latencies in highly utilised data centres. She has over twelve years of industry experience working on networking, cloud computing, and distributed systems. She also has an MS from The Ohio State University, where her work investigated cloud resource allocation to bottleneck stages for processing streaming applications. Her research has been published in top-tier conferences, and also as a book. She has also been actively involved in mentoring, teaching, and community leadership, including founding Women Who Go, India.

Seminar series: 
Systems Research Group Seminar

Upcoming seminars