Paying Attention to Efficiency: LLM Deployment on Mobile and Edge Devices

Date:

Tuesday, 5 November, 2024 - 13:00 to 14:00

Speaker:

Stefanos Laskaridis, Brave Software

Venue:

Lecture Theatre 2, Computer Laboratory, William Gates Building

Transformers have recently sparked significant interest in AI, driving advancements in accuracy and enabling a wide range of applications, from multi-modal intelligent assistants to autonomous systems. While their scaling laws promise even greater capabilities, the demands on hardware and data present significant challenges. In response, there is growing interest in compressing these models to smaller, more efficient forms, making them feasible for deployment with lower resource requirements. As edge and mobile devices are integrating increasingly powerful System-On-Chips (SoCs), deploying these models locally becomes viable, thus enabling new use-cases while enhancing privacy, sustainability and task-specific customization.

In this talk, I will be touching upon two areas: first, measuring the execution efficiency and deployability of Large Language Models (LLMs) on mobile and edge devices; and second optimising DNN workloads for efficiency through low-rank decompositions. I will introduce MELT (MobiCom'24), a benchmarking framework designed to assess the computational, memory, energy, and thermal characteristics of LLMs running on device, identifying associated bottlenecks. Following this, I will present Maestro (ICML'24), a novel approach leveraging trainable low-rank decompositions to enable more efficient training and deployment of DNNs, enabled via data-informed progressive shrinking of networks.

"You can also join us on Zoom":https://cam-ac-uk.zoom.us/j/83400335522?pwd=LkjYvMOvVpMbabOV1MVTm8QU6DrGN7.1

Seminar series:

Artificial Intelligence Research Group Talks

View on talks.cam

Calendar

Upcoming seminars

15Aug

Recursive Definitions in Lean

Joachim Breitner, Lean FRO

Computer Architecture Group Meeting
15Aug

Recursive Definitions in Lean

Joachim Breitner, Lean FRO

Computer Architecture Group Meeting
20Aug

Title to be confirmed

Speaker to be confirmed

Foundation AI
27Aug

sheaf and tomato

Stefano Fiorini

Foundation AI
27Aug

sheaf and tomato

Stefano Fiorini

Foundation AI

View all seminars

Upcoming seminars

About the department

Social media

Study at Cambridge

About the University

Research at Cambridge