git clone https://github.com/cmu-l3/l1.git
cd l1
pip install -e verl
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install -e .
You can use scripts in scripts/data
to prepare your own dataset.
Example, generate data for traininng L1-Exact:
python scripts/data/deepscaler_dataset.py
For L1-Max:
python scripts/data/deepscaler_dataset.py --use_both_both
For Evaluation on AIME2025, GPQA, LSAT and MMLU, you can use scripts in scripts/eval
:
python scripts/data/generate_aime.py
python scripts/data/generate_gpqa.py
python scripts/data/generate_lsat.py
python scripts/data/generate_mmlu.py
You can skip this step if you want to use our pre-trained models.
You can run scripts in scripts/train
to train your own models. Make sure to specify the correct data path.
Use one of scripts/eval
to evaluate your models. Make sure to specify the correct model path.
For example, evaluate L1-Exact on AIME2025:
./scripts/eval/eval_model_token.sh --model path/to/your/model --num-tokens <num_tokens> --datasets aime2025
To replicate results for L1-Exact and L1-Max from the paper, you can use scripts in scripts/replicate
.
- Prepare data:
./scripts/replicate/prepare_data.sh
- Evaluate models:
./scripts/replicate/eval_inference_exact.sh l3lab/L1-Qwen-1.5B-Exact
./scripts/replicate/eval_inference_max.sh l3lab/L1-Qwen-1.5B-Max
- We would like to thank DeepSeek for releasing Deepseek-r1 and distilled models,
- Qwen for releasing super-awesome Qwen-2.5 math Models, and
- Agentica for codebase, and opensourcing their models and datasets! This codebase is built on top of their work.
If you use L1/LCPO in your research, please cite:
@misc{aggarwal2025l1controllinglongreasoning,
title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning},
author={Pranjal Aggarwal and Sean Welleck},
year={2025},
eprint={2503.04697},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.04697},
}