Trainer memory usage issue #20630
Unanswered
ys010
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone,
I'm experiencing a significant memory usage difference between a manual training loop and using pl.Trainer in PyTorch Lightning, and I'm hoping someone might have insights.
Here's the situation:
When I run my training loop manually (iterating through the DataLoader, calling training_step, backpropagation, optimizer step, etc.), the memory usage is approximately 1GB.
However, when I use pl.Trainer with the same LightningModule and DataLoader, the memory usage jumps to about 5GB.
What I've tried:
I've disabled callbacks (callbacks=[]), logging (logger=False), and validation (val_check_interval=None) in pl.Trainer.
My questions:
Are there any other hidden memory-consuming features in pl.Trainer that I might be missing?
Could there be differences in how pl.Trainer handles the DataLoader or performs other optimizations that contribute to the higher memory usage?
Are there any recommended methods for profiling the memory usage of the trainer itself?
Can it be a new issue with handling mps on Macos Sequoia
[Edit]:
When i use cpu (no mps), the memory usage is as expected when using the Trainer.
Beta Was this translation helpful? Give feedback.
All reactions