
This project expedites geospatial data processing by applying modern lightweight compression techniques to GeoTIFF-stored data, addressing the challenge of CPU bandwidth surpassing RAM bandwidths. We mitigate the impact of poor cache locality and the resulting memory bottlenecks by leveraging CPUs' superscalar capabilities and SIMD (Single Instruction, Multiple Data) instructions. By implementing SIMD-optimised compression, data remains compressed in RAM and closer to the CPU caches, facilitating faster access and alleviating memory constraints. Through multi-objective Pareto optimisation, the project identifies optimal compression schemes based on space and time efficiency. This approach achieves up to 29% speedups relative to uncompressed pipelines, which further increases to 78% when aggregations are fused into the decompression process. These speedups expedite the management and analysis of large datasets, crucial for addressing environmental challenges like climate change and ecosystem conservation. The project also comprehensively benchmarks and ranks compression algorithms on geospatial data, providing a resource for optimising geospatial pipelines and directing future research.
Code and full report link.