Skip to content
/ l1 Public

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Notifications You must be signed in to change notification settings

cmu-l3/l1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning




How to Use?

Installation

git clone https://github.com/cmu-l3/l1.git
cd l1
pip install -e verl
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install -e .

Prepare Dataset

You can use scripts in scripts/data to prepare your own dataset.

Example, generate data for traininng L1-Exact:

python scripts/data/deepscaler_dataset.py 

For L1-Max:

python scripts/data/deepscaler_dataset.py --use_both_both

For Evaluation on AIME2025, GPQA, LSAT and MMLU, you can use scripts in scripts/eval:

python scripts/data/generate_aime.py
python scripts/data/generate_gpqa.py
python scripts/data/generate_lsat.py
python scripts/data/generate_mmlu.py

Train Models

You can skip this step if you want to use our pre-trained models.

You can run scripts in scripts/train to train your own models. Make sure to specify the correct data path.

Evaluate Models

Use one of scripts/eval to evaluate your models. Make sure to specify the correct model path.

For example, evaluate L1-Exact on AIME2025:

./scripts/eval/eval_model_token.sh --model path/to/your/model --num-tokens <num_tokens> --datasets aime2025

Replicate Results

To replicate results for L1-Exact and L1-Max from the paper, you can use scripts in scripts/replicate.

  1. Prepare data:
./scripts/replicate/prepare_data.sh
  1. Evaluate models:
./scripts/replicate/eval_inference_exact.sh l3lab/L1-Qwen-1.5B-Exact
./scripts/replicate/eval_inference_max.sh l3lab/L1-Qwen-1.5B-Max

Acknowledgments

  • We would like to thank DeepSeek for releasing Deepseek-r1 and distilled models,
  • Qwen for releasing super-awesome Qwen-2.5 math Models, and
  • Agentica for codebase, and opensourcing their models and datasets! This codebase is built on top of their work.

Citation

If you use L1/LCPO in your research, please cite:

@misc{aggarwal2025l1controllinglongreasoning,
  title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning}, 
  author={Pranjal Aggarwal and Sean Welleck},
  year={2025},
  eprint={2503.04697},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2503.04697}, 
}

About

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published