My name is Ziyi Yang (ζ¨ειΈ). You can call me Ziyi.
- π± Iβm currently learning at Sun Yat-sen University as a third-year MS student (expected to graduate in 2026), advised by Prof. Xiaojun Quan. Before this, I received my Bachelor's degree (2019-2023, computer science and technology) from Sun Yat-sen University. I am currently an intern at Tongyi Lab, Alibaba Group (2025.05-now).
- π€ My primary research interests lie at several key areas in LLM post-training. These include heterogeneous model fusion, with a focus on integrating diverse LLMs into a stronger one; advanced preference learning algorithms such as DPO and SimPO; the development of large reasoning models (LRMs) capable of adaptive thinking; and novel reinforcement learning (RL) methodologies, particularly in long-context reasoning and mutli-agent self-play scenarios. My representative publications are listed below.
- π Iβm actively seeking algorithm jobs focused on LLM mid-training & post-training, with interest in discovering novel mutli-task training paradigm, advanced RL algorithm (e.g., multi-agent self-play), and scalable reward system for non-verifiable tasks (e.g., rubric as rewards, generative verifier).
- π« How to reach me: E-mail
View my homepage.

