Part II, Part III and ACS Projects
The following project suggestions are starting points for possible Part II, Part III and ACS projects. ACS and Part III projects require more of a research emphasis to be successful, where as Part II projects might focus more on computer engineering and reproducibility of results, though more research-oriented projects are possible and can attract top marks if successful.
Please contact the proposer(s) by email if you are interested in any of the projects below. In addition, some of the projects from previous years may still be suitable and interesting. Please remember, these are just starting points that suggest possible directions for the research. You can continue to check here again over the coming weeks for more projects. We would also be happy to consider any project ideas you have too.
Project suggestions for 2024/25
Here are the project suggestions for this year. Feel free to contact us about other project ideas you have too.
Bit-manipulation Instructions Extension for Toooba
Contact: Jonathan Woodruff or Peter Rugg
RiscyOO (currently called Toooba) is a parameterisable superscalar, out-of-order RISCV implementation in Bluespec SystemVerilog. Toooba supports many common extensions for RISC-V, including floating point and compressed instructions, but does not support the new bit-manipulation extension, “B”, which includes zba (Address computation), zbb (basic bit manipulation), and zbs (single-bit instructions). This set of extensions is part of recent standard RISC-V profiles (e.g. RVA22), and compilers support them. This project would add these extensions to Toooba, adding test generators to QCVEngine in the TestRIG framework to verify correctness. This project would then explore performance improvement due to various instruction subsets using compiler flags, analysis of disassembled code, dynamic instruction count, and cycle time for executing benchmarks. As an extension, this project would build for FPGA and execute large-scale SPEC benchmarks using the architecture groups existing continuous integration infrastructure, analysing timing and area costs for the extension.
Cache Zeroing Extension for Toooba
Contact: Jonathan Woodruff or Peter Rugg
RiscyOO (currently called Toooba) is a parameterisable superscalar, out-of-order RISCV implementation in Bluespec SystemVerilog. The CHERI research group is using Toooba for security-extension research. However Toooba lacks support for the new cbo.zero instruction, which zeros an entire cache block/line. This extension is very useful to enforce security primitives, such as zeroing heap allocations on free, or zeroing the stack before return. This project would implement cbo.zero for CHERI-Toooba, plumbing the special memory operation into the cache where an entire line can be written with zeros in a single cycle, taking care to update appropriate state in the load/store queues and store buffer, if necessary. Testing would be done with the TestRIG framework. This project would then perform a thorough evaluation of the performance improvement for various state zeroing protections with the new instruction. As an extension, this project may evaluate on FPGA with large-scale applications, or may explore further hardware zero-cache-line optimisations, such as storing zeroed cache lines more efficiently in cache.
Parameterising a superscalar, out-of-order core down to a scalar, in-order pipeline
Contact: Jonathan Woodruff or Peter Rugg
RiscyOO (currently called Toooba) is a parameterisable superscalar, out-of-order RISCV implementation in Bluespec SystemVerilog. This project would extend the Toooba project with custom modules and further parameterisation of current modules to allow a reasonably efficient single-issue, in-order core. This would greatly extend the usable range of implementations that can be produced from the single Toooba code base, aiming to test the hypothesis that, with proper engineering, it may be possible to maintain a single, open-source processor design to meet a wide range of performance/area targets. Recent Konata visualisation support in Toooba will enable visualisation of the pipeline performance. Performance would be evaluated in simulated MiBench benchmarks, as well as CoreMark. An extension would measure area and timing on FPGA.
Parameterising a superscalar, out-of-order core down to a dual-issue, in-order pipeline
Contact: Jonathan Woodruff or Peter Rugg
RiscyOO (currently called Toooba) is a parameterisable superscalar, out-of-order RISCV implementation in Bluespec SystemVerilog. This project would extend the Toooba project with custom modules and further parameterisation of current modules to allow a reasonably efficient dual-issue, in-order core. This would extend the usable range of implementations that can be produced from the single Toooba code base, aiming to test the hypothosis that, with proper engineering, it may be possible to maintain a single, open-source processor design to meet a wide range of performance/area targets. Recent Konata visualisation support in Toooba will enable visualisation of the pipeline performance. Performance would be evaluated in simulated MiBench benchmarks, as well as CoreMark. An extension would measure area and timing on FPGA.
Extending a superscalar, out-of-order core to allow multiple memory requests per cycle
Contact: Jonathan Woodruff or Peter Rugg
RiscyOO (currently called Toooba) is a parameterisable superscalar, out-of-order RISCV implementation in Bluespec SystemVerilog. Toooba currently only allows a single memory pipeline, though the number of integer pipelines and floating point pipelines are parameterisable. This project would extend the load/store queue to have a vector of interfaces to allow multiple memory pipelines to execute in the same cycle. This project would also bank the L1 cache so that multiple loads could execute per cycle, perhaps only to interleaved subsets of cache lines. This would relieve a mavor bottleneck, and dramatically extend a single parameterised design further into high-performance configurations.Recent Konata visualisation support in Toooba will enable visualisation of the pipeline performance. Performance would be evaluated in simulated MiBench benchmarks, as well as CoreMark. An extension would measure timing and area on FPGA.
Perceptron Branch Predictor for Toooba
Contact: Jonathan Woodruff or Peter Rugg
RiscyOO (currently called Toooba) is a parameterisable superscalar, out-of-order RISCV implementation in Bluespec SystemVerilog. Toooba already supports a small suite of branch predictors, including GSelect, GShare, and a tournament predictor. This project would develop a modern “perceptron” predictor, based on published literature and open publications. This project would develop a branch predictor module in Bluespec SystemVerilog, and then integrate the hardware simulation into the ChampSim framework in order to study behaviour and performance in comparison to state-of-the-art simulated branch predictors. This project would then integrate the new perceptron branch predictor into Toooba and measure performance improvement in simulation, including MiBench benchmarks and CoreMark. An extension would synthesise for FPGA, evaluating area and timing, and performance on SPEC benchmarks.
TAGE Branch Predictor for Toooba
Contact: Jonathan Woodruff or Peter Rugg
RiscyOO (currently called Toooba) is a parameterisable superscalar, out-of-order RISCV implementation in Bluespec SystemVerilog. Toooba already supports a small suite of branch predictors, including GSelect, GShare, and a tournament predictor. This project would develop a modern “TAGE” predictor, based on published literature and open publications. This project would develop a branch predictor module in Bluespec SystemVerilog, and then integrate the hardware simulation into the ChampSim framework in order to study behaviour and performance in comparison to state-of-the-art simulated branch predictors. This project would then integrate the new TAGE branch predictor into Toooba and measure performance improvement in simulation, including MiBench benchmarks and CoreMark. An extension would synthesise for FPGA, evaluating area and timing, and performance on SPEC benchmarks.
Instruction Fusion Limit Study
Contact: Jonathan Woodruff
Conventional microarchitectural wisdom has led us to believe that very wide processor pipelines are exponentially more power hungry than simple, scalar pipelines. However, recent mobile processors have been much wider than their power-hungary desktop counterparts, and yet much more power efficient. One technique used here is large-scale instruction fusion. That is, treating a sequence of instructions as a single super-instruction microarchitecturally that can read multiple registers and write multiple registers. For example, one arithmetic pipeline could be capable of executing a series of 8 arbitrary simple integer operations which read up to 4 operands between them, and write up to 3 results. All intermediate results would be passed internally to this pipeline without occupying forwarding paths or a physical register, saving power and increasing performance.
This project would perform a limit study for instruction fusion for the RISC-V architecture. Both static RISC-V binaries and dynamic RISC-V instruction traces would be analysed to determine the opportunities for fusion in these programs. For example, how many instructions can be eliminated if a processor is capable of arbitrary integer instruction fusion of various granularities? How many more if arbitrary arithmetic can be fused with a trailing memory operation? How many more if known-contiguous memory operations could be fused into wider memory operations? How many if simple integer and multi-cycle integer (e.g. MUL) could be fused? How many if integer and floating point could be fused? Etc. As an extension, this project would look at the interplay between fused instructions, which must execute in dependency order and may be multi-cycle, and a parallel, out-of-order execution in a more traditional superscalar pipeline.
Memory Renaming Limit Study
Contact: Jonathan Woodruff
X86 processors have long supported limited “memory renaming” to accelerate stack operations. In the decode stage of the pipeline, it can be known that memory operations will alias even if the full address is not known. For example, if a store at an immediate offset of 32 followed several instructions later by a load from the same offset, and if the stack pointer is not modified, it can be known that the loaded value will be the same value that was stored, and the pipeline may simply assume the original physical register holds the value that will be loaded, breaking the dependency through memory. Stated more clearly, the value in the sp[32] memory location is renamed in the pipeline to a physical register. This idea can be further generalised by tracking immediate pointer arithmetic in decode to identify aliasing memory locations even as pointers in registers are changing.
This project would perform a study of how applicable this technique is to RISC-V programs. Both static RISC-V binaries and dynamic RISC-V instruction traces would be analysed to determine the opportunities for memory renaming in these programs. For example, how many load values can be statically known from previous stores given various instruction windows? As an extension, this project could look at address aliasing prediction, to explore to what extent aliasing addresses can be predicted without perfect knowledge, potentially leading to flushes if a register value was forwarded in error.
Vector runahead
Contact: Timothy Jones
To address the widening performance gap between CPU cores and main memories, designers have implemented prefetchers into various levels of the cache hierarchy, so as to bring data close to the processor before it is needed, meaning it is available in fast storage at the point of use. There are a wide variety of data prefetchers available, but few that can accurately identify data that is accessed through complex data structures.
An alternative scheme is Runahead execution. Here, when the processor is stalled, it continues speculatively fetching an executing instructions from the future so as to perform their memory accesses, then discards them and re-executes them correctly once the pipeline starts up again. This provides an accurate form of prefetching within the core, rather than as separate logic beside the cache. Until recently though, Runahead techniques couldn't deal with complex access logic either. However, a new scheme, called Vector Runahead, can effectively prefetch these access patterns, providing significant performance increases for certain workloads. The aim of this project is to implement Vector Runahead in the gem5 simulator to reproduce the impressive results obtained, with more advanced extensions possible too.
This project is fairly involved and should only be tackled by someone with strong C++ coding skills.