Daixuan Cheng 成岱璇

Ph.D. Student @ Gaoling School of AI, Renmin University of China · Intern @ GenAI Group, Microsoft Research

I work on LLMs across pre-training, post-training, and agents.
Current focus: agentic LLM training, see LLM-in-Sandbox and ClawGym.

Current focus: agentic LLM training

Agent harness

Agentic RL infra

Other directions

Pre-training & Mid-training

Post-training & Adaptation


Education & Experience

Education

Ph.D. in Artificial Intelligence, Gaoling School of AI, Renmin University of China (2025 – Present)
M.S. in Computer Science, Beijing University of Posts and Telecommunications (2020 – 2023)
Advisor: Haifeng Sun
B.S. in Communication Engineering, Beijing University of Posts and Telecommunications (2016 – 2020)

Experience

Research Student, GenAI Group, Microsoft Research (2021 – Present)
Research Assistant, CoAI Group, Tsinghua University (2023 – 2024)
Research Engineer, Beijing Institute for General Artificial Intelligence (2023 – 2025)
Collaborator: Xuekai Zhu

Selected Papers

View Full List on Google Scholar →
Computer Environments Elicit General Agentic Intelligence in LLMs
Daixuan Cheng, Shaohan Huang, Yuxian Gu, Huatong Song, Guoxin Chen, Li Dong, Wayne Xin Zhao, Ji-Rong Wen, Furu Wei
arXiv preprint, 2026 — General/Code Agent
Coding Agents are General Agents · 🤗 #1 Paper of the Day · YouTube 300K+ views
ClawGym: A Scalable Framework for Building Effective Claw Agents
Fei Bai*, Huatong Song*, Shuang Sun*, Daixuan Cheng, Yike Yang, Chuan Hao, Renyuan Li, Feng Chang, Yuan Wei, Ran Tao, Bryan Dai, Jian Yang, Wayne Xin Zhao, Ji-Rong Wen
arXiv preprint, 2026 — Agentic data collection, training infra and evaluation benchmark
My role: built the black-box agentic RL pipeline for OpenClaw
Reasoning with Exploration: An Entropy Perspective
Daixuan Cheng, Shaohan Huang, Xuekai Zhu, Bo Dai, Wayne Xin Zhao, Zhenliang Zhang, Furu Wei
AAAI 2026 — Exploration of RL in LLM Reasoning
Earliest Research on Entropy and Exploration · No. 1 Most Influential Paper of AAAI 2026
Adapting Large Language Models via Reading Comprehension
Daixuan Cheng, Shaohan Huang, Furu Wei
ICLR 2024 — Domain Adaptation (Continual Pre-Training) of LLMs
Earliest Research on Domain LLMs · 500K+ Downloads · #1 Trending of All Domain LLMs · 🤗 #2 Paper of the Day
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Daixuan Cheng, Yuxian Gu, Shaohan Huang, Junyu Bi, Minlie Huang, Furu Wei
EMNLP 2024 (Main, Long Paper) — LLM Pre-training and Mid-training
Earliest Research on Mid-Training · 300K+ Downloads · #2 Trending of All HF Datasets · 🤗 #2 Paper of the Day
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Denvy Deng, Qi Zhang
EMNLP 2023 (Main, Long Paper) — Retrieval Augmented Generation
Early Research on RAG · Top ML Papers of the Week (along with GPT-4)
FlowRL: Matching Reward Distributions for LLM Reasoning
Xuekai Zhu, Daixuan Cheng, Dinghuai Zhang, Hengli Li, Kaiyan Zhang, Che Jiang, Youbang Sun, Ermo Hua, Yuxin Zuo, Xingtai Lv, Qizheng Zhang, Lin Chen, Fanghao Shao, Bo Xue, Yunchong Song, Zhenjie Yang, Ganqu Cui, Ning Ding, Jianfeng Gao, Xiaodong Liu, Bowen Zhou, Hongyuan Mei, Zhouhan Lin
ICLR 2026 — Exploration of RL in LLM Reasoning
GFlowNet for LLM Reasoning · 🤗 #1 Paper of the Day
On Domain-Adaptive Post-Training for Multimodal Large Language Models
Daixuan Cheng, Shaohan Huang, Ziyu Zhu, Xintong Zhang, Wayne Xin Zhao, Zhongzhi Luan, Bo Dai, Zhenliang Zhang
EMNLP 2025 (Findings, Long Paper) — Domain Adaptation of MLLMs
Earliest Research on Domain MLLMs
How to Synthesize Text Data without Model Collapse?
Xuekai Zhu, Daixuan Cheng, Hengli Li, Kaiyan Zhang, Ermo Hua, Xingtai Lv, Ning Ding, Zhouhan Lin, Zilong Zheng, Bowen Zhou
ICML 2025 — Synthetic Data
Semi-Synthetic Data Avoids Model Collapse
VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching
Junyu Bi, Daixuan Cheng, Ping Yao, Bochen Pang, Yuefeng Zhan, Chuanguang Yang, Yujing Wang, Hao Sun, Weiwei Deng, Qi Zhang
ICCV 2023 — Pre-Training of Vision-Language Models
ELECTRA-VL
Snapshot-guided Domain Adaptation for ELECTRA
Daixuan Cheng, Shaohan Huang, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Furu Wei, Denvy Deng, Qi Zhang
EMNLP 2022 (Findings, Short Paper) — Domain Adaptation of LM
Continual Pre-Training for Encoder-based LMs

Honors & Community Impact

Academic impact:
700+ citations on 1st-author papers (2 with 200+)
Most Influential Paper of AAAI 2026 (1st-author)
Open-source impact:
🤗 HuggingFace Top Contributor
Released models / datasets with 800K+ downloads
Service:
Outstanding Reviewer of EMNLP 2025 (Top 0.5%)
Reviewer for ICLR, NeurIPS, ACL, etc.
Scholarship:
1st Place in PhD Entrance Exam (Preliminary), GSAI, RUC
National Scholarship for Master Students (Top 1%)
1st Prize in National English Competition (Top 0.5%)