projects

  1. Learning What Reinforcement Learning Can’t: Interleaved Online Fine-Tuning for Hardest Questions
    Lu Ma, Hao Liang, Meiyi Qiang, and 9 more authors
    ICLR
  2. Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model
    Yanhao Li†, Lu Ma†, Jiaran Zhang, and 3 more authors
    ACL
  3. DataFlex
    2025
  4. DataFlow
    2025

† means equal contribution