L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

How to Use?

Installation

git clone https://github.com/cmu-l3/l1.git
cd l1
pip install -e verl
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install -e .

Prepare Dataset

You can use scripts in scripts/data to prepare your own dataset.

Example, generate data for traininng L1-Exact:

python scripts/data/deepscaler_dataset.py

For L1-Max:

python scripts/data/deepscaler_dataset.py --use_both_both

For Evaluation on AIME2025, GPQA, LSAT and MMLU, you can use scripts in scripts/eval:

python scripts/data/generate_aime.py
python scripts/data/generate_gpqa.py
python scripts/data/generate_lsat.py
python scripts/data/generate_mmlu.py

Train Models

You can skip this step if you want to use our pre-trained models.

You can run scripts in scripts/train to train your own models. Make sure to specify the correct data path.

Evaluate Models

Use one of scripts/eval to evaluate your models. Make sure to specify the correct model path.

For example, evaluate L1-Exact on AIME2025:

./scripts/eval/eval_model_token.sh --model path/to/your/model --num-tokens <num_tokens> --datasets aime2025

Replicate Results

To replicate results for L1-Exact and L1-Max from the paper, you can use scripts in scripts/replicate.

Prepare data:

./scripts/replicate/prepare_data.sh

Evaluate models:

./scripts/replicate/eval_inference_exact.sh l3lab/L1-Qwen-1.5B-Exact
./scripts/replicate/eval_inference_max.sh l3lab/L1-Qwen-1.5B-Max

Acknowledgments

We would like to thank DeepSeek for releasing Deepseek-r1 and distilled models,
Qwen for releasing super-awesome Qwen-2.5 math Models, and
Agentica for codebase, and opensourcing their models and datasets! This codebase is built on top of their work.

Citation

If you use L1/LCPO in your research, please cite:

@misc{aggarwal2025l1controllinglongreasoning,
  title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning}, 
  author={Pranjal Aggarwal and Sean Welleck},
  year={2025},
  eprint={2503.04697},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2503.04697}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
deepscaler		deepscaler
docs		docs
scripts		scripts
verl		verl
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

How to Use?

Installation

Prepare Dataset

Train Models

Evaluate Models

Replicate Results

Acknowledgments

Citation

About

Releases

Packages

Contributors 2

Languages

cmu-l3/l1

Folders and files

Latest commit

History

Repository files navigation

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

How to Use?

Installation

Prepare Dataset

Train Models

Evaluate Models

Replicate Results

Acknowledgments

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages