skip to content

Department of Computer Science and Technology

Image: Raoul-Gabriel Urma, of Cambridge Spark, and the students on the Schmidt Data for Science Course

Our first-ever Schmidt Data for Science Residency Programme is underway – and is being enthusiastically received by the 27 PhDs and postdocs selected to participate.

The first cohort of students (pictured above, with teacher Raoul-Gabriel Urma) are all from disciplines beyond computer science. They are spending five weeks in this virtual instructor-led accelerated masterclass, learning to apply data analysis to their own datasets and problems.

The course, which is being delivered by Cambridge Spark, "is the most well-structured teaching experience I’ve come across so far for fundamental and more advanced data science concepts," says one participant, Chemistry PhD student Roxine Staats.

"Having attempted to self-teach, and enrol in other less in-depth courses covering these concepts," she adds, "I can say that the Schmidt Programme so far has been peerless in its guidance and 'bootcamp'-style approach to teaching, and building on, fundamental skills."

This course is part of the new Accelerate Programme for Scientific Discovery, which is being funded by a generous donation to the University from Schmidt Futures. The programme – led  by Professor Neil Lawrence, DeepMind Professor of Machine Learning here – is designed to equip researchers in science disciplines outside computer science with the skills they need to use machine learning in their research.

"Machine learning and AI are increasingly part of our day-to-day lives, but they aren’t being used as effectively as they could be, due in part to major gaps of understanding between different research disciplines," says Neil.

"This programme will help us to close these gaps by training physicists, biologists, chemists and other scientists in the latest machine learning techniques, giving them the skills they need while accelerating the excellent research already taking place here at the University."

"I'm hoping to be able to understand much better what machine learning can and can't do, and use those insights to help explore our existing datasets and inform how we should focus our efforts as we build new technologies in the wet lab. Hopefully, we can push the boundaries of the experimental science and the data science at the same time!"

Lia Chappell, molecular biologist, Wellcome Sanger Institute.

The first cohort includes researchers from chemistry, biochemistry, physics, engineering, medicine, veterinary medicine and psychology.

Roxine Staats is researching neurodegenerative diseases (such as Parkinson’s and Alzheimer’s) for her PhD in the Department of Chemistry. In her studies, she is analysing the effects of potential drugs for neurodegenerative disorders and looking at ways of rationally designing compounds to hinder the molecular mechanisms that kill patients’ brain cells.

"I've been fascinated for a while by the potential of machine learning and data science tools, but I didn’t understand how they could best be applied to my own data," she says.

"I'm particularly excited to have a one-on-one mentor who is helping me to tailor what I’m learning on the course to the data I’ve collected during my PhD. Having an experienced guide take the time to understand the nature of my datasets and then walk me through suitable applications, analyses and outputs is an outstanding feature of the programme."

Ryan Geiser is also researching neurodegenerative diseases for a PhD co-supervised by the Department of Chemistry and Institute of Public Health. He is analysing European population-based studies of more than 20,000 people to examine the relationships between the use of different drugs and subjects’ dementia or cognitive status. The aim is to identify drugs that could be repurposed to treat Alzheimer's disease, or that contribute to the risk of developing it, and this requires significant statistical analyses of the data.

He applied to take the course to help him with this work and he’s really enjoying it so far. "It's a fun and interactive programme that is guiding me through the art of machine learning and opening up new ways for me to analyse my epidemiology data," he says. "I’m applying the skills I've learned earlier in the week to my PhD project while being supervised by a computer science mentor."

Caroline Watson is studying for her PhD in the University’s Department of Oncology, where she works on analysing blood sequencing data to try and determine which blood cell mutations are associated with the highest risk of developing leukaemia.

"I'm loving the course so far," she says. "I started using the programming language Python about two years ago and so far, have gained much of my knowledge from programming forums, and trial and error. This course is teaching me much better – and much more efficient – ways of viewing and manipulating data, and these will be incredibly useful for the large datasets I need to work with. I’m already putting it in to practice!"

Lia Chappell, a fellow participant, is a molecular biologist at the Wellcome Sanger Institute. She applied, she says, because she has heard a great deal about potential of machine learning and wants to find out more about how it could be harnessed.

"Leaders pushing the edge of my field (single cell genomics)," she explains, "are beginning to talk at conferences about using machine learning to help us understand large multi-dimensional datasets that represent collections of cells from patients, tissues and organs.

"I'm hoping to be able to understand much better what machine learning can and can't do, and use those insights to help explore our existing datasets and inform how we should focus our efforts as we build new technologies in the wet lab. Hopefully, we can push the boundaries of the experimental science and the data science at the same time!"

And like other students, she has tried other courses but is particularly impressed by this one.

"I'm finding the instruction here much more engaging than many courses I've been to in-person. The screen-sharing with a voice lets you see what the instructor is typing. Parts of it looks like magic, in terms of how fast of it is to find an answer in the data. And it's reassuring and helpful to see how to fix small bugs in the code as they crop up.

"Overall," she adds, "I'm really happy to have found out about the course and excited to have been chosen to take part. I'll be recommending this course to friends and colleagues when the opportunity to apply comes up again."

Research group: 

Published by Rachel Gardner on Wednesday 29th July 2020