skip to content

Department of Computer Science and Technology

Read more at: In Praise of Undergraduate Research

In Praise of Undergraduate Research

8 August 2019

In my last post I discussed the Janus automatic binary parallelisation tool that my postdoc, Kevin, has developed. At VEE earlier this year we had another paper on Janus , this time extending it to extract other forms of parallelism—automatic vectorisation for data-level parallelism and software prefetching for memory-level parallelism. We show how these schemes are applied to binaries in the context of Janus (with a neat trick for dealing with bounds-checking code when inserting prefetches to arrays) and evaluate them together. I’m not aware of any other work that tries to extract all three forms of parallelism at once. However, what I liked best about this paper was not the techniques, nor the results, but the fact that the two passes...


Read more at: Boosting Performance By Limiting Vectorisation

Boosting Performance By Limiting Vectorisation

27 October 2015

It sounds a bit counter-intuitive, but boosting application performance by limiting the amount of vectorisation carried out is essentially what my postdoc, Vasileios Porpodas, and I have done in our latest paper on automatic vectorisation. We call it TSLP, or Throttled SLP, because it limits the amount of scalar code that the standard SLP algorithm converts into vectors.

The actual paper is available here . Vasileios presented it at PACT last week and will be at the LLVM Developers’ Meeting this week, so I thought it might be interesting to expand on one of the examples we give at the end, showing the source code and how it is actually vectorised with SLP and TSLP. The kernel is compute-rhs, which is a...