About Me

I am a Master student at Peking University, advised by Professor Bin Cui and Assistant Professor Wentao Zhang. My research interests include large language models, AI agents, and data scaling.

Before this, I obtained my bachelor’s degree in Computer Science from Renmin University of China. Previously, I conducted my internship at Tencent AI Lab and Meituan LongCat. I am currently a research intern at Alibaba Qwen.

Please feel free to reach out if you are interested in collaboration.


Educations

  • Peking University (2024 - Present), Master Student in Computer Science
  • Renmin University of China (2020 - 2024), B.Sc. in Computer Science


Selected Projects

ICLR 2026
sym

Learning What Reinforcement Learning Can’t: Interleaved Online Fine-Tuning for Hardest Questions

  • Lu Ma, Hao Liang, Meiyi Qiang, Lexiang Tang, Xiaochen Ma, Zhen Hao Wong, Junbo Niu, Chengyu Shen, Runming He, Yanhao Li, Bin Cui, Wentao Zhang
  • ICLR 2026.
  • ReLIFT combines reinforcement learning and supervised fine-tuning in an interleaved training framework that expands LLM reasoning capabilities beyond the limits of RL alone.
  • [Paper] [Code]
ACL 2026
sym

Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

  • Yanhao Li*, Lu Ma*, Jiaran Zhang, Lexiang Tang, Wentao Zhang, Guibo Luo
  • ACL 2026.
  • LEASH introduces an adaptive length-penalty mechanism for LLM reasoning using a Lagrangian primal-dual optimization framework.
  • [Paper]
Open Source
sym

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation

  • Hao Liang, …, Lu Ma, …, Bin Cui, Wentao Zhang
  • DataFlow is a data preparation system designed to parse, generate, process, and evaluate high-quality data from noisy sources, improving the performance of LLMs in specific domains.
  • [Paper] [Code]
Open Source
sym

DataFlex: A Dynamic Training Framework

  • DataFlex is an advanced dynamic training framework built on top of LLaMA-Factory.
  • [Code]


Publications

*: Equal Contribution   †: Equal Contribution

  • ICLR'26Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
    Lu Ma, Hao Liang, Meiyi Qiang, Lexiang Tang, Xiaochen Ma, Zhen Hao Wong, Junbo Niu, Chengyu Shen, Runming He, Yanhao Li, Bin Cui, Wentao Zhang
    International Conference on Learning Representations

    PDF Code

  • ACL'26Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model
    Yanhao Li†, Lu Ma†, Jiaran Zhang, Lexiang Tang, Wentao Zhang, Guibo Luo
    Association for Computational Linguistics

    PDF

  • CVPR'25Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
    Junbo Niu, Yuanhong Zheng, Ziyang Miao, Hejun Dong, Chunjiang Ge, Hao Liang, Lu Ma, Bohan Zeng, Qiahao Zheng, Conghui He, Wentao Zhang
    IEEE/CVF Conference on Computer Vision and Pattern Recognition

    PDF

  • AAAI'25Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
    Lexiang Tang, Xianwei Zhuang, Bang Yang, Zhiyuan Hu, Hongxiang Li, Lu Ma, Jinghan Ru, Yuexian Zou
    AAAI Conference on Artificial Intelligence

  • TKDE'25Acceleration Algorithms in GNNs: A Survey
    Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Xiaonan Nie, Jiawei Jiang, Wentao Zhang, Bin Cui
    IEEE Transactions on Knowledge and Data Engineering

  • arXiv'26Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning
    Zhengyang Zhao†, Lu Ma†, Wentao Zhang

    PDF

  • arXiv'26ANDES: Agent Native Data Evolving Synthesis Tool for Autonomous Instruction Alignment
    Zhengyang Zhao, Shengjie Ye, Lu Ma, Hao Liang, Hengyi Feng, Wentao Zhang

    PDF

  • arXiv'26Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning
    Lexiang Tang, Weihao Gao, Bingchen Zhao, Lu Ma, Bang Yang, Yuexian Zou

    PDF

  • arXiv'26GIFT: Unlocking Global Optimality in Post-Training via Finite-Temperature Gibbs Initialization
    Zhengyang Zhao†, Lu Ma†, Yizhen Jiang, Xiaochen Ma, Zimo Meng, Chengyu Shen, Lexiang Tang, Haoze Sun, Peng Pei, Wentao Zhang

    PDF

  • arXiv'25DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation
    Hao Liang, Xiaochen Ma, ..., Lu Ma, ..., Bin Cui, Wentao Zhang

    PDF Code

  • arXiv'25DARO: Difficulty-Aware Reweighting Policy Optimization
    Jingyu Zhou†, Lu Ma†, Hao Liang, Chengyu Shen, Bin Cui, Wentao Zhang

    PDF


Internships

  • Tencent AI Lab, Research Intern
  • Meituan LongCat, Research Intern
  • Alibaba Qwen, Research Intern