skip to content

Department of Computer Science and Technology

The Association for Computing Machinery (ACM)  – the world's largest association of computing professionals  – has honoured a researcher here for her work on the efficient use of cloud resources for data stream processing.

Eva Kalyvianaki – an Associate Professor here – was co-author in 2013 of Integrating scale out and fault tolerance in stream processing using operator state management, a pioneering paper in the field.

Ten years on, it has received the Test of Time Award 2023 from SIGMOD, the ACM’s Special Interest Group on the Management of Data.

This award recognizes the conference paper from 10 years previously that has had the most impact over the intervening decade on research, products and methodology.

The paper – co-authored by Eva when she was a postdoctoral researcher at Imperial College London with her then colleagues Raul Castro Fernandez (the first author), Matteo Migliavacca and Peter Pietzuch – addressed challenges in scaling data stream processing systems to large numbers of cloud-hosted machines.

At the time, data processing was moving from being the analysis of online data streams on a single machine to the online processing on virtual machines of a stream of real-time data.

With data increasing in both size and speed, new kinds of data stream processing systems were being designed to scale out to machines hosted in the cloud. But though the cloud offers infinite resources, in theory, there are also challenges, with failures among virtual machines being a common problem.  

So the paper's authors set out to address the issue of how to scale out real time data processing to dozens of virtual machines on demand in a way that was both efficient and robust to machine failures. 

"This was relatively easy to do for data processing where the operator states are well-known, but much harder where they are arbitrary," Eva says.

"So we defined state as a first class citizen, treating it the same way as any other kind of data structure, as an entity that could be copied and migrated from one machine to another.

"Once we had that, we defined APIs on the state, and then we could use the same APIs to deal with scalability and fault tolerance, meaning you could do well-defined operations over the state and the APIs would copy, checkpoint, split and merge it."

The innovative techniques they developed in managing operator state in stream processing systems enabled seamless scalability and robust fault tolerance and performed better than existing systems at the time. And their work informed the design of subsequent, state-of-the-art industrial data processing systems.

Using research to inform teaching
Ten years on, the paper has another use – as a teaching aid in a course here for third-year undergraduates on the fundamentals of cloud computing. The course includes a lecture on the use of the cloud in analysing online real-time data streams.

"I want our students to gain the confidence to address problems and arrive at the solution by themselves," Eva says. "Because I was part of this research, I can tell them what we did behind the scenes to get to our solution – including what worked and what didn't work. I hope that helps them work out the steps and processes they need to go through to get to the answer."


Published by Rachel Gardner on Monday 17th July 2023