*Abstract*
Machine learning rests on three pillars: algorithms, hardware, and data. In the context of close-range forest monitoring, we've already seen major advances in the first two—shifting from classical processing methods to neural networks, and from manual tools like tape measures to LiDAR-based laser scanning. These breakthroughs have enabled the development of faster and more accurate forest monitoring algorithms.
However, data remains a bottleneck. High-quality, annotated forest datasets are scarce and costly to produce, and their size still falls short of the scale required for robust machine learning. Meanwhile, the rise of graphics engines—and the success of synthetic data in domains like self-driving and robotics—makes us wonder: can forests benefit from a similar approach? The key challenge lies in whether synthetic forest environments can capture the representations needed for generalisation to real-world data.
In this talk, I’ll focus on the task of instance segmentation of individual trees—a core bottleneck in many field applications. I’ll present my current progress in generating synthetic forest plots and point cloud data using Unreal Engine, and evaluate their performance against a state-of-the-art model trained on a leading real-world dataset. I’ll also discuss upcoming directions and experimental plans. Time permitting, I’ll give a live demo of my synthetic data pipeline, showing how we can go from video games to ML-ready datasets.
This is a work-in-progress talk, and I look forward to feedback and discussion.
*Bio*
Yihang She is a second-year PhD student in Computer Science at the University of Cambridge. His research focuses on advancing computer vision in the novel context of forest monitoring, spanning both close-range and satellite-based observations.