skip to content

Department of Computer Science and Technology

Read more at: Janus: Statically-Driven and Profile-Guided Automatic Dynamic Binary Parallelisation

Janus: Statically-Driven and Profile-Guided Automatic Dynamic Binary Parallelisation

18 February 2019

One of the themes of my research has been and continues to be the exploitation of parallelism in its many forms. I’ve looked into data-level parallelism by improving the performance of SLP by, for example, reducing the number of instructions that are vectorised and (spoiler alert for a future publication) I have a PhD student working on speculative vectorisation. With Sam Ainsworth , formerly my PhD student, now a postdoc, I have published research that exploits memory-level parallelism within the compiler , architecture and in both with a programmable prefetcher . We’ve also looked into taking advantage of parallelism for error detection . However, the first work I did in this area, and the kind of work...


Read more at: An Event-Triggered Programmable Prefetcher for Irregular Workloads

An Event-Triggered Programmable Prefetcher for Irregular Workloads

28 March 2018

Over the last few years my PhD student, Sam Ainsworth , and I have been looking into data prefetching, especially for applications containing irregular memory accesses. We published a paper in ICS 2016 about a specialised hardware prefetcher that optimises breadth-first traversals on graphs in the commonly-used compressed sparse-row format, which I previously blogged about . We also published a paper at CGO on automatic software-prefetch generation, more generally for indirect memory accesses ( blog post ). At ASPLOS this year, we marry the two ideas together and generalise even further, creating a programmable prefetcher, using an event-driven programming model, that is capable of fetching in data for many types of memory access, complete...


Read more at: Comparison of AArch64 Dynamic Binary Modification Tools

Comparison of AArch64 Dynamic Binary Modification Tools

12 July 2017

Over the past few years I’ve become increasingly interested in dynamic binary modification (DBM) tools, so much so that I supervise a PhD student who is trying to parallelise binaries using one, and am just starting work on a grant that continues and extends this work . On Intel’s architecture, Pin is probably the most famous tool, and one that I had most experience of in the past. (As an aside, Pin is a dynamic binary instrumentation tool, but I’m going to use modification instead of instrumentation throughout this post, since modification subsumes instrumentation and I’m more interested in optimisation than just analysis.) However, it’s closed source and only targets Intel’s ISAs. Another option is ...


Read more at: Student Research Competition at EuroLLVM 2017

Student Research Competition at EuroLLVM 2017

29 March 2017

My student, Sam Ainsworth , has won first prize in the student research competition at EuroLLVM 2017 . This work was previously published at CGO 2017 and I’ve blogged about it too. Below is a copy of his poster, or download it here . Well done, Sam!


Read more at: Software Prefetching for Indirect Memory Accesses

Software Prefetching for Indirect Memory Accesses

23 February 2017

I’ve always considered software prefetching to be something of a black art. There have been times in the past when I’ve looked at my code, noticed a load is causing problems and tried inserting one or more software prefetches to alleviate the issue. Mostly this hasn’t worked, although I’ve never been sure why. In fact, even when it has worked I haven’t been totally sure why it has, usually because it’s involved a lot of trial and error in trying out different options before I hit on improved performance.

Now it turns out that most of the time I was probably trying to prefetch the wrong things. Trying to prefetch linked data structures, which are those that involve pointer chasing (like a linked list),...


Read more at: My Year 2016

My Year 2016

23 December 2016

I started blogging in October 2015 with the aim of publicising my group’s research a little more, having a space to write about topics and work that weren’t going to be published, and delving into our research results in more detail than possible in a page-constrained article. A year on, I wanted to look back and see how things had gone in my first year as a lecturer, but the events of this October overtook me. Now, at the end of the calendar year when everything is calmer, it seems like a good opportunity to summarise the last twelve months and point to blog posts written and events that I didn’t find time to talk about. Given the season I’ll try not to make it...


Read more at: The Lynx Queue

The Lynx Queue

9 August 2016

This post is about my group’s second ICS paper from June this year, which describes a new single producer / single consumer (SP/SC) software queue that we developed for frequent inter-core communication. It’s faster than existing implementations and we call it Lynx . It’s available on my group’s data page .

Initially, we didn’t set out to create a new queue. We were experimenting with transient error detection techniques in software. Transient, or soft, errors are faults that occur sporadically within a microprocessor, causing a data value or instruction to change. They are the result of strikes to the chip from cosmic rays (or usually the secondary particles they excite) or alpha particles from...


Read more at: Hardware Graph Prefetchers

Hardware Graph Prefetchers

3 June 2016

This week sees the publication of two papers from my research group at ICS 2016 and so, in this post, I’d like to look a little more into one of these schemes: the graph prefetcher that my student, Sam, has developed.

Graph workloads are important in a number of domains, and becoming increasingly so. You only have to look at the numerous social media applications to see examples of graph-based data (e.g. in a network of people, each person is a vertex and the edges represent links to friends). But graph representations are also significant in less publicly-visible application areas, such as those in scientific computing or “big data” analytics. However, efficient processing of graph workloads is often...


Read more at: Minute Madness on Program Parallelisation

Minute Madness on Program Parallelisation

25 May 2016

Today was the annual Wheeler lecture at the Computer Laboratory, and before the main event, a talk by Andrew Herbert, there was a Minute Madness where people from across the Lab, ranging from MPhil students through to professors, talked for one minute about their research with a single slide as a prop. My slide and something approximating the words I used are below.

“Hello! My group works on ways of making applications go faster, through a technique called program parallelisation.

If you look on the left of the slide, the red wavy arrow represents a regular sequential application with a single thread of execution within it. This means that instructions execute one...


Read more at: Addressing Temporal Memory Safety

Addressing Temporal Memory Safety

26 February 2020

Our upcoming Oakland paper was released onto the internet recently, despite the publication date actually being in May when the conference is held (the IEEE Symposium on Security and Privacy , to give its official name). So now seemed like a good time to talk about some of the security work we've been doing, in particular our research into schemes for temporal memory safety.