-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dygraph]Integration sharding stage2 function #38151
[Dygraph]Integration sharding stage2 function #38151
Conversation
Thanks for your contribution! |
b1bf3cc
to
7d5ad2e
Compare
7d5ad2e
to
576a132
Compare
4db80ec
to
c1bf4fc
Compare
04d6d9f
to
5d6cc91
Compare
5d6cc91
to
7b26ec9
Compare
7b26ec9
to
cf9b633
Compare
self._rank_buffer_size = {} # {dtype: {rank: numel+alignment}} | ||
self._param2align = {} # {param.name: align} | ||
|
||
# Default information | ||
self._optim_defaults = kw | ||
self._optim = optim | ||
self._ori_parameter_list = copy.deepcopy(self._optim._parameter_list) | ||
self._ori_param_groups = copy.deepcopy(self._optim._param_groups) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deepcopy increase memory..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修复,改成引用传递。
@@ -94,7 +94,7 @@ def __init__(self, | |||
filter(lambda x: x.trainable and x.dtype == Type.fp16.value, | |||
self._local_params))) > 0 | |||
|
|||
assert group is not None, "Distributed communication group is must be gived" | |||
assert group is not None, "Distributed communication group is must be given" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need support global group if group=None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已支持
2715035
to
0f53247
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
0f53247
to
bfee70c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
Others
Describe
Integration sharding stage2 function
1.Support group = None
2.Support param_groups for optimizer