Robust Alignment of Large Language Models

Date:

Friday, 23 May, 2025 - 12:00 to 13:00

Speaker:

Dr. Sangwoong Yoon (UCL)

Venue:

ONLINE ONLY. Here is the Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09

The alignment of large language models (LLMs) can often be brittle when faced with the complexities of real-world deployment. In this talk, I share our investigations on two scenarios where special care is required to ensure robust alignment.

The first scenario is multi-objective alignment, where balancing competing objectives is particularly challenging. Our recent work, **Robust Multi-Objective Decoding (RMOD),** an inference-time alignment algorithm, adaptively adjusts the weights of different objectives during response generation to ensure none are neglected. RMOD provides principled robustness with minimal overhead, consistently outperforming existing methods across several alignment benchmarks.

In the second part of the talk, I will address preference model misspecification in self-play alignment. While self-play is a promising alignment approach, naive implementations are vulnerable to inaccuracies in the preference model. To address this, our **Regularized Self-Play Policy Optimization (RSPO)** framework offers a versatile and modular method for regularizing the self-play alignment process. RSPO’s ability to combine various regularizers results in strong performance gains on multiple evaluation sets, such as AlpacaEval-2 and Arena-Hard.

As a bonus, I will briefly introduce our recent investigation into the robustness of **Mixture-of-Agent (MoA)** systems, a popular multi-agent paradigm. We show that even a single malicious agent introduced into the mixture can nullify the benefits of the entire system.

Seminar series:

NLIP Seminar Series

View on talks.cam

Calendar

Upcoming seminars

15Jul

"Please Verify": How Human Behavior Undermines Blockchain Security

Taro Tsuchiya, Carnegie Mellon University

Security Seminar
17Jul

SolarFit: A Successor Refinement Approach for Sizing of PV and Storage Systems in EV-Enabled Homes

Julia Gschwind ETH Zurich, University of Cambridge

Energy and Environment Group
23Jul

Google DeepMind’s Gemini models and the Rise of Long-Context LLMs

Dr Nikolay Savinov (Google DeepMind)

Foundation AI
30Jul

Title to be confirmed

Stephen Xia, Northwestern University

Centre for Mobile, Wearable Systems and Augmented Intelligence Seminar Series
04Aug

Learning Under Constraints: From Federated Collaboration to Black-Box LLMs

Salma Kharrat, Kaust

Cambridge ML Systems Seminar Series

View all seminars

Upcoming seminars

About the department

Social media

Study at Cambridge

About the University

Research at Cambridge