- PhD Candidate
I am a PhD candidate at the Department of Computer Science and Technology of the University of Cambridge, focusing on Machine Learning and Artificial Intelligence. With my supervisor, Prof. Cecilia Mascolo, I am investigating the automation of the ML pipeline for sparse, unlabelled, and out-of-distributions data. My interests span the areas of uncertainty estimation, AutoML, Active Learning, and addressing distribution shifts in time series data with AI, while I am always open to exploring new promising topics.
Prior to starting my PhD studies, I completed an MRes in Sensor Technologies and Applications and an MPhil in Advanced Computer Science, both from the University of Cambridge. I also hold a BSc in Computer Science from University College London (UCL).
Research
- Artificial Intelligence
- Machine Learning
- Deep Learning
- Mobile Systems
- Time Series Data
- Real-World Data
- Active Learning
- AutoML
- Semi-Supervised Learning
Teaching
- Thesis Research (MPhil ACS, Part II)
- Mobile Health (MPhil ACS, Part III)
- Mobile Health (Part II)
- Object-Oriented Programming (Part IA)
Publications
ACM Transactions on Computing for Healthcare, vol. 6, no. 3. 2025.
Abstract: Machine Learning models typically assume that time series are regularly spaced; however, this is often unrealistic in healthcare, where missing data recordings are common. In this context, uncertainty estimates play a pivotal role, as they can enable confident and non-confident predictions to be distinguished. We propose SQUIREDL, a novel uncertainty-aware sequence-to-sequence prediction method for sparse healthcare time series. Specifically, we enhance the state-of-the-art evidential regression framework, widely used for uncertainty estimation, to handle missing data. Following data imputation with an Akima spline-based method, we modify the loss function of evidential regression by assigning different weights to imputed and observed data points, to offer more reliable uncertainty estimates. Additionally, we examine a variety of metrics for assessing the success of uncertainty estimations on sequence-to-sequence predictions, providing a reliable way to evaluate the models in a medical setting. Our proposal is demonstrated in two clinical applications. In continuous glucose monitoring, we use sequence-to-sequence prediction to obtain the hypoglycaemia risk from glucose sensor readings. Our approach captures the ground truth risk values 30% more accurately, bringing consistent improvements in both uncertainty-aware and accuracy-based metrics. Similarly, in COVID-19 hospital admissions data, we achieve a 22% improvement in the accuracy of uncertainty-aware predictions, enabling better resource planning.
In Proceedings of the 47th IEEE Engineering in Medicine & Biology Conference (EMBC), 2025. Copenhagen, Denmark.
Abstract: Sensor-generated time series hold immense potential across the healthcare domain, yet present challenges in labelling due to their sequential nature, which requires consideration of context and temporal dependencies. Recognising the costly nature of data labelling and that domain experts may have limited technical expertise in model optimisation, we introduce an approach to automate machine learning model training for medical time series, enhancing analysis efficiency. Our proposal first operates at the data input level via adaptive data acquisition, facilitating the selection of highly-informative samples for labelling. Further, it works at the model level, through dynamic model refinement to optimise the model on-the-fly by progressively exploring the possible hyperparameter options and choosing the best combination at each acquisition step, and through an automatic learning phase to maximise the usage of any unlabelled samples. This results in a robust learning strategy that continuously refines the model with expanding data and human expertise. Demonstrated on EEG, ECG, and IMU health signal classification, our method outperforms baselines and the current state-of-the-art, while reducing reliance on human input for model tuning. SALTS enhances the applicability of machine learning to healthcare time series, maximising the information gained through each human annotation step in an automated way.
7th UK Mobile, Wearable & Ubiquitous Systems Research Symposium (MobiUK), 2025. Edinburgh, Scotland.
Abstract: Foundation models (FMs), including large language models and time series-specific models, have shown promise in the mobile and wearable data domain, particularly for analysing ECG and EEG biosignals. However, their performance on out-of-distribution (OOD) time series, especially when collected from diverse users and devices, remains under-explored. While task-specific models excel in in-domain tasks, they often struggle with generalisation, particularly when data quality varies across different mobile and wearable devices. This work evaluates the robustness of FMs on OOD time series from diverse sensor technologies, highlighting their strengths and limitations in mobile and wearable real-world applications.
In Proceedings of the 45th IEEE Engineering in Medicine & Biology Conference (EMBC), 2023. Sydney, Australia.
Abstract: Supervised machine learning (ML) is revolutionising healthcare, but the acquisition of reliable labels for signals harvested from medical sensors is usually challenging, manual, and costly. Active learning can assist in establishing labels on-the-fly by querying the user only for the most uncertain –and thus informative– samples. However, current approaches rely on naive data selection algorithms, which still require many iterations to achieve the desired accuracy. To this aim, we introduce a novel framework that exploits data augmentation for estimating the uncertainty introduced by sensor signals. Our experiments on classifying medical signals show that our framework selects informative samples up to 50% more diverse. Sample diversity is a key indicator of uncertainty, and our framework can capture this diversity better than previous solutions as it picks unlabelled samples with a higher average point distance during the first queries compared to the baselines, which pick samples that are closer together. Through our experiments, we show that augmentation-based uncertainty makes better decisions, as the more informative signals are labelled first and the learner is able to train on samples with more diverse features earlier on, thus enabling the potential expansion of ML in more real-life healthcare use cases.
7th UK Mobile, Wearable & Ubiquitous Systems Research Symposium (MobiUK), 2023. Lancaster, England.
Abstract: Machine Learning (ML) models for sequence-to-sequence tasks predicting one series from another expect continuous time series, but missing points and inconsistencies are common in mobile and wearable data. Additionally, models are often not integrated with uncertainty-aware solutions: uncertainty estimations are crucial, as they can discern confident vs non-confident predictions. We propose uncertainty-aware sequence-to-sequence prediction on sparse time series. We enhance the state-of-the-art evidential regression with time series interpolation and modify its loss function for irregular series, tuning it to assign different weights to different types of points, as required by distinct uncertainty meanings varying per task and requirement. We also propose novel metrics for assessing the success of uncertainty estimations on sequence-to-sequence predictions, offering a robust way to assess uncertainty given by ML models, as opposed to accuracy-focused metrics.

