OOM error #34

Davido111200 · 2024-11-19T01:32:01Z

Hi,

Thanks for sharing this awesome repo. I'm having memory problem with running run.sh with 1 H100 gpu. I tried reducing batch sizes to 1, but still I can not fine-tune a 7B LLM. Any idea how to work this around?

Hprairie · 2024-12-18T01:04:46Z

Hey did you ever get around this? How much memory was required? I will drop from fp32 to bf16 and reduce the batch size, but wanted to see if I could get any insight. Thanks for the help!

Davido111200 · 2024-12-18T02:59:50Z

I didn't unfortunately. It seems like you gotta try multi-gpu training for this one

Hprairie · 2024-12-18T06:37:16Z

Ahh no worries, I appreciate the response!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM error #34

OOM error #34

Davido111200 commented Nov 19, 2024

Hprairie commented Dec 18, 2024

Davido111200 commented Dec 18, 2024

Hprairie commented Dec 18, 2024

OOM error #34

OOM error #34

Comments

Davido111200 commented Nov 19, 2024

Hprairie commented Dec 18, 2024

Davido111200 commented Dec 18, 2024

Hprairie commented Dec 18, 2024