
Submitted by Rachel Gardner on Fri, 15/01/2021 - 12:33
At a time when concerns about data sharing and data privacy have soared, students here have had a rare opportunity to experiment with Federated Learning – a technique for training machine learning models across devices without sharing the raw data they hold.
The term Federated Learning (FL) was first coined in 2016. Since then, worries among both individuals and businesses about data sharing and data privacy – such as the concerns seen this month over WhatsApp sharing users’ data with parent Facebook – has fuelled interest in this area.
There is now an array of technology to enable devices and organisations to share information about the data they hold in order to learn a shared prediction model without exchanging the raw data itself.
"Although Federated Learning is becoming more important, it's still a fairly niche area so it's not yet being widely taught."
Dr Nic Lane, Senior Lecturer
In response to this increased interest, a decision was made to add FL to a Machine Learning Systems module on our MPhil in Advanced Computer Science course.
"We want to teach our students about Federated Learning and about how we can make it more practical," says Senior Lecturer Dr Nic Lane, who teaches the module.
And this is an unusual step, he adds. "Although Federated Learning is becoming more important, it's still a fairly niche area so it's not yet being widely taught."
The power of Federated Learning
Typically, in order to train a neural network to solve a specific task, data is collected from thousands of users and sent to a data centre for processing using graphics processing units (GPU's). This is the norm today for tasks like speech recognition or image understanding that are common in everyday devices like smartphones.
"A good example of the power of Federated Learning is the 'autocomplete' function on a smartphone keyboard," explains Dr Pedro Porto Buarque De Gusmao, a postdoctoral researcher working with Nic Lane.
"Text that we type into our phone might otherwise be sent to a data centre to train a model that will predict the next word to be typed in a sentence."
And this brings a clear issue over privacy, he adds. "Users might not want to share the content of their texts with a third party. So our main goal with Federated Learning is to keep data local and to use the collective power of millions of mobile devices together to perform distributed model training, without raw data ever leaving the phone. Already companies like Google are using FL in their smartphone keyboard technology."
But helping students to learn about the possibilities of Federated Learning and how to improve it as a technique is not entirely straightforward.
"There are not yet many software learning libraries or frameworks available for teaching it in ways that will give students a hands-on experience," says Nic, who also heads the Machine Learning Systems Lab here.
Overcoming obstacles to teach FL
This obstacle was overcome through a new collaboration between the Department and Flower, an open source Federated Learning framework co-founded by Daniel J Beutel, a former Master's student of Nic.
Following their lectures on the subject, a new lab was created for the students using Flower software and virtual machines rented from Amazon Web Services. This gave them the chance to implement what they had learned in theory with some practical, hands-on experience.
"They could explore some of FL scenarios, get a feel for how things work and what the major trade-offs are in the design space of the algorithms," Nic says.
"It helped give them a more nuts-and-bolts view of how things are actually done – because, while you can read a description on a slide, until you actually see the code you don’t realise exactly how it’s working."
The practical lab also let them encounter some of the common issues occurring in FL.
Lab offers 'real world' flavour of the topic
"For example, when you are working with multiple devices and the data is not evenly distributed across all of them, that can create a skew or bias. The machine learning algorithms we have today struggle with that situation," Nic explains.
"Also, when working with multiple devices, you may find they are not all even in computer power or the speed of their connection – and some may drop out altogether. Unlike in a data centre, when you are working with a distributed set of devices, you can’t assume you’ve got control over all these conditions."
This 'real world' flavour of the lab was highly valuable and was enhanced by the involvement of the Flower developers.
"This included comparing the impact of different solutions not only in terms of final result (accuracy), but also taking into account other real-world aspects such as communications costs, for example," Pedro explains.
"Another good thing about using Flower," he adds, "is the fact that it can be deployed on distributed systems. The same is not true for most frameworks as they can only run simulations in a single computer. Since Federated Learning is all about distributed training over a large number of clients, the ability to run experiments over multiple virtual machines allowed students to run realistic examples within the allotted lab time."
A positive experience for students
Feedback so far suggests that the students found the lab a positive experience. "Machine Learning Systems focuses on an exciting new discipline at the intersection of machine learning and real-world engineering challenges and I enrolled because I wanted to get hands-on experience with new methods like FL," says student Luke Guerdan.
After using Flower for the research project associated with this module, he liked it so much that "I am now planning to use Flower for my MPhil thesis."
Meanwhile, the developers behind Flower also benefitted from their involvement in running the lab.
"It resulted in some good learnings for the Flower project which will help to improve future versions," says Daniel J Beutel.
 
      