How long does training take with default training settings? #66

mesllo · 2022-05-05T17:10:31Z

If I train with 8 GPUs, a batch size 4, and 100 epochs (50 for normal lr, 50 for decaying lr), how long would it take? I'm asking because I only have 4 GPUs and I am using a shared environment where I won't be able to train for more than 8 hours per run.

hanchaoyuan · 2022-05-06T02:54:38Z

The complete training has four stages, the third stage has 200 epochs and takes the longest. The training code prints the estimated time, you can try it

mesllo · 2022-05-06T12:22:33Z

Thank you for your reply! I have tried it and even for the first stage, training takes about 24 hours on 4 GPUs which is not feasible for me. I can also try to take a multi-node approach and run on 8 GPUs across 2 nodes. Is the code already optimized for multiple nodes or do I have to do this myself? From what I can understand, it seems that the current code only considers a single node.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How long does training take with default training settings? #66

How long does training take with default training settings? #66

mesllo commented May 5, 2022

hanchaoyuan commented May 6, 2022

mesllo commented May 6, 2022 •

edited

Loading

How long does training take with default training settings? #66

How long does training take with default training settings? #66

Comments

mesllo commented May 5, 2022

hanchaoyuan commented May 6, 2022

mesllo commented May 6, 2022 • edited Loading

mesllo commented May 6, 2022 •

edited

Loading