Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM error #34

Open
Davido111200 opened this issue Nov 19, 2024 · 3 comments
Open

OOM error #34

Davido111200 opened this issue Nov 19, 2024 · 3 comments

Comments

@Davido111200
Copy link

Hi,

Thanks for sharing this awesome repo. I'm having memory problem with running run.sh with 1 H100 gpu. I tried reducing batch sizes to 1, but still I can not fine-tune a 7B LLM. Any idea how to work this around?

@Hprairie
Copy link

Hey did you ever get around this? How much memory was required? I will drop from fp32 to bf16 and reduce the batch size, but wanted to see if I could get any insight. Thanks for the help!

@Davido111200
Copy link
Author

I didn't unfortunately. It seems like you gotta try multi-gpu training for this one

@Hprairie
Copy link

Ahh no worries, I appreciate the response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants