skip to content

Department of Computer Science and Technology

In my last post I discussed the Janus automatic binary parallelisation tool that my postdoc, Kevin, has developed.  At VEE earlier this year we had another paper on Janus, this time extending it to extract other forms of parallelism—automatic vectorisation for data-level parallelism and software prefetching for memory-level parallelism.  We show how these schemes are applied to binaries in the context of Janus (with a neat trick for dealing with bounds-checking code when inserting prefetches to arrays) and evaluate them together.  I’m not aware of any other work that tries to extract all three forms of parallelism at once.  However, what I liked best about this paper was not the techniques, nor the results, but the fact that the two passes for Janus were written by undergraduates working with my group.

The first technique on prefetching was written by Márton Erdős over the summer.  Márton worked as a researcher in my group as a UROP student (undergraduate research opportunities programme), a scheme run by the university with funding from a variety of sources, such as industry, research councils and departments.  Márton’s funding came from the EPSRC, for which I am very grateful.

The great thing about the UROP scheme is that it gave me a chance to work with an excellent undergraduate student who I likely wouldn’t have worked with otherwise, and it gave him a chance to get some research experience that he may not have otherwise got.

Márton came and worked with my group for ten weeks, adapting the algorithm that Sam Ainsworth and I developed for our CGO 2017 paper so that it works well in the binary modification environment that Janus presents. He contributed a fair amount of work to the static analyser and a neat optimisation trick that I mentioned earlier. Essentially when there is one array indexed by a second, of the form A[B[i]], then we prefetch B[i+64] and later, when we estimate that this data is in the cache, we prefetch A[B[i+32]].  For this second prefetch we have to ensure that B[i+32] is within the bounds of the A array by guarding it with a check, otherwise the program could experience an error that isn’t there without the prefetching.  Márton’s optimisation is to elide these checks by placing a series of nops after each load that could fail and installing a custom signal handler to catch the seg fault and check whether these nops exist or not.

Vectorisation was implemented by George Wort as part of his final year research project.  He basically took a suite of benchmarks (the test suite for vectorising compilers (TSVC)) and set about creating the analysis and transformations in Janus to make that happen. Being in Janus, that meant leveraging the static analyses we already had as much as possible, and building on the existing rewrite rules that had been implemented for thread-level parallelisation.

Both students wrote a fair bit of code, interacted with the rest of the group and generally got a taste of university-style research. That they also both managed to successfully produce working implementations of their respective schemes and could then contribute to a conference publication shows the calibre and hard work put in by the students I was lucky enough to work with.

Of course there could have been pitfalls and I was careful to think about what they could be in advance. Choice of project is particularly important, especially thinking about it in conjunction with the amount of time available for the work, to avoid something too challenging and ultimately turning the students off the idea of research. Also, I thought about how the students would interact with the team and me, in particular how much time I would need to devote to them and when; George had to write a dissertation based on his research so it was important that I scheduled time to give quick feedback on drafts.  With these things thought out in advance, both projects progressed smoothly (or as smoothly as possible for research!).

In summary then, both projects were a great experience for students and me alike, with some great work completed by the end of them, and I urge anyone who hasn’t involved undergraduates in their research yet to give it a try.

Tag: 
Students
Vectorisation
Prefetching
Parallelisation