Adaptive Resource Allocation for Low-Latency LLM Serving in Dynamic Environments

Date:

Monday, 1 September, 2025 - 14:00 to 15:00

Speaker:

Masayuki Usui and Shinya Takamaeda-Yamazaki (University of Tokyo)

Venue:

Computer Lab, SS03

Abstract:
Large language models (LLMs) face significant challenges in achieving low-latency inference. Techniques such as speculative decoding and chunked prefill can help reduce latency, but their effectiveness depends heavily on algorithmic parameters that are sensitive to fluctuating system conditions. As a result, static parameter settings often lead to suboptimal performance under dynamic workloads. To address this issue, we propose dynamic parameter optimization methods that adapt to evolving environments to maximize performance. In this talk, we present the technical details of these methods along with initial evaluation results.

Bio:
Masayuki Usui received his bachelor's and master's degrees in computer science from the University of Tokyo, Japan. He is currently pursuing a Ph.D. degree at the University of Tokyo. His research interests include LLM inference serving and computer architecture.

Shinya Takamaeda-Yamazaki received his B.E., M.E., and D.E. degrees from the Tokyo Institute of Technology, Japan, in 2009, 2011, and 2014, respectively. Since 2019, he has been an Associate Professor at the University of Tokyo, Japan. In 2025, he also became a Team Leader at RIKEN AIP, Japan. His research interests include computer architecture, hardware design technologies, and machine learning systems.

Seminar series:

Systems Research Group Seminar

View on talks.cam

Calendar

Upcoming seminars

09Oct

The Road to Formalising 8-Dimensional Sphere Packing in Lean

Sidharth Hariharan, Carnegie Mellon University

Formalisation of mathematics with interactive theorem provers
10Oct

NLIP 2025 Social: Meet New PhD Students

Speaker to be confirmed

NLIP Seminar Series
10Oct

Evaluating Baseline and Forecasting Success: Making REDD+ More Credible

E-Ping Rau, University of Cambridge

Energy and Environment Group
10Oct

Semiring Semantics: Algebraic Properties vs. Logical Results

Sophie Brinke (RWTH Aachen)

Logic and Semantics Seminar
13Oct

Perplexity AI: Under the Hood of LLM Inference

Nandor Licker

Technical Talks

View all seminars

Upcoming seminars

About the department

Social media

Study at Cambridge

About the University

Research at Cambridge