skip to content

Department of Computer Science and Technology

Date: 
Friday, 8 May, 2026 - 12:00 to 13:00
Speaker: 
Zhijiang Guo (HKUST (GZ) | HKUST)
Venue: 
ONLY ONLY. Here is the Google Meet Link: https://meet.google.com/cru-hcuo-rhu

In this talk, I will present CodeScaler, a novel framework designed to overcome the scalability bottlenecks of Reinforcement Learning from Verifiable Rewards (RLVR) in code generation. While traditional RLVR relies heavily on the availability of high-quality unit tests—which are often scarce or unreliable—CodeScaler introduces an execution-free reward model that scales both training and test-time inference. By leveraging carefully curated preference data, syntax-aware code extraction, and validity-preserving reward shaping, CodeScaler achieves significant performance gains, improving the Qwen3-8B-Base model by an average of +11.72 points across five benchmarks. Furthermore, CodeScaler functions as a highly efficient test-time scaling method, delivering performance comparable to execution-based approaches while reducing latency by 10$\times$. I will discuss how this approach enables robust optimization on synthetic datasets without the need for test cases and its broader implications for enhancing reasoning capabilities in general domains.

Seminar series: 
NLIP Seminar Series

Upcoming seminars