LLMs are arguably among the largest technology investments
since the moon landing, and rely on custom hardware accelerators both for training and inference. The talk will cover accelerating LLM transformer architectures using the combination of a compiler and a systolic compute array. The key enabler to achieving meaningful performance using the systolic compute array are deep program analyses
of the model architecture in the Neuron Compiler. I will briefly report on our effort to build a verified (using Lean) compiler from XLA/HLO to the Trainium ISA.
*Daniel Kroening* is a Senior Principal Applied Scientist at Amazon, where he works on the correctness of the Neuron Compiler for distributed training and inference. Prior to joining Amazon, he worked as a Professor of Computer Science at the University of Oxford and is the co-founder of Diffblue Ltd., a University spinout that develops AI that targets code and code-like artefacts. He wrote the CBMC (for C), JBMC (for Java) and EBMC (for SystemVerilog) model checkers; CBMC is the engine of Kani (for verifying unsafe Rust). He has received the Semiconductor Research Corporation (SRC) Inventor Recognition Award, an IBM Faculty Award, a Microsoft Research SEIF Award, the Wolfson Research Merit Award, and the Rance Cleaveland Test-of-Time tool award. He serves on the CAV steering committee and was co-chair of FLOC 2018, EiC of Springer FMSD, and is co-author of the textbooks on Decision Procedures and Model Checking.
!https://upload.wikimedia.org/wikipedia/commons/thumb/9/94/Pizza.svg/330px-Pizza.svg.png!
*We will have Pizza! Register such that we can get the right amount of food:*
https://docs.google.com/forms/d/1RlUWlAroGtv-cePwe2lL118KbQFzTX647tn4zcmCbuA/viewform?edit_requested=true