-
Notifications
You must be signed in to change notification settings - Fork 51
refactor(hamt): remove child interface from hamt package #30
refactor(hamt): remove child interface from hamt package #30
Conversation
20e5cd5
to
ae9cb5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!! This is certainly looking much nicer. Just seeing the diversity in how we were sometimes referring to child
s and others to shardValule
s to actually mean the same thing (now unified as Shard
) is a good indication I think that we're in the right path.
Left some notes, don't take them as strong directives but rather as suggestions to think about. I still don't have a clear picture of what the final state of this package should look like and it's something that will need some more thought and discussion, some of my comments may be way off so feel free to point out whatever doesn't sound good to you.
hamt/hamt.go
Outdated
shardValueNode | ||
) | ||
|
||
func (ds *Shard) nodeType() nodeType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like encapsulating this logic, ideally we are unifying both types to make their manipulation as transparent as possible so if we do indeed need this function it should be called as little as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schomatis I have a small thought about this: Shard
only have 2 type nodes: Shard
node and Value
node, so maybe we can only use the follow function:
func (ds *Shard) isValueNode() bool {
if ds.key != "" && ds.val != nil {
return true
}
return false
}
And then we can replace switch
logic with if ... else
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeeees!! Definitely that is conceptually much clearer, since we only care about knowing where does our directory stop and the rest of the DAG hierarchy continues, that makes much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern with the "node" terminology here is that, although conceptually from the UnixFS layer perspective this is another (separate) node (a leaf node in the trie), actually it's being encoded inside its parent (internal) node in the DAG layer (i.e., that node is not stored separately in the DAG service). But I guess that is a correct abstraction, this collapse at the DAG level (for performance reasons) doesn't need to be complicating the logic here.
hamt/hamt.go
Outdated
lnk2 := *lnk | ||
return &shardValue{ | ||
return &Shard{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to call makeShard
first to properly initialize the Shard
structure and later load its key/value pair.
hamt/hamt.go
Outdated
@@ -219,6 +218,10 @@ func hash(val []byte) []byte { | |||
// Label for Shards is the empty string, this is used to differentiate them from | |||
// value entries | |||
func (ds *Shard) Label() string { | |||
nodeType := ds.nodeType() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could just always return the ds.key
attribute here (without calling nodeType
, in an effort to distantiate this logic from the shard/shard-value conflict), an internal node should have an empty string anyway.
hamt/hamt.go
Outdated
ds.children = append(ds.children[:i], append([]child{sv}, ds.children[i:]...)...) | ||
ds.nd.SetLinks(append(ds.nd.Links()[:i], append([]*ipld.Link{nil}, ds.nd.Links()[i:]...)...)) | ||
ds.children = append(ds.children[:i], append([]*Shard{sv}, ds.children[i:]...)...) | ||
ds.nd.SetLinks(append(ds.nd.Links()[:i], append([]*ipld.Link{lnk}, ds.nd.Links()[i:]...)...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does make sense to me but I'd prefer having the conservative approach of avoiding modifications in behavior as much as possible since this does not directly relate to the child
interface. How we handle DAG links and the proto-node cache will be reviewed and refactored in another (separate) issue/PR.
@@ -380,7 +388,7 @@ func (ds *Shard) rmChild(i int) error { | |||
return nil | |||
} | |||
|
|||
func (ds *Shard) getValue(ctx context.Context, hv *hashBits, key string, cb func(*shardValue) error) error { | |||
func (ds *Shard) getValue(ctx context.Context, hv *hashBits, key string, cb func(*Shard) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: this seems like another great example of where we could use the Walker, generalizing the Shard
as just another type of node in a DAG, where we could call its Iterate
method. But, as previously suggested, this may be an unnecessary over-complication. I just don't like seeing the same recursive/switch logic over and over everywhere in the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually the Search function from Walker. It's making a decision about which child to traverse next based on the files content hash.
@@ -471,20 +480,22 @@ func makeAsyncTrieGetLinks(dagService ipld.DAGService, onShardValue func(*shardV | |||
} | |||
} | |||
|
|||
func (ds *Shard) walkTrie(ctx context.Context, cb func(*shardValue) error) error { | |||
func (ds *Shard) walkTrie(ctx context.Context, cb func(*Shard) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll raise a separate issue about this but we definitely need to unify this function with getValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my belief is getValue is a search -- it returns a single value (through a callback which this seems odd given you only want a single value), making a decision at every layer of the tree based on the file's content hash-- while walkTrie is an iteration (it traverses every node) -- so they're not the same exactly. But of course these are both the two functions implemented by your IPLD walker. So they could potentially both be replaced easily with your IPLD walker.
Also though, I am not exactly sure what context walkTrie will really get called instead of the async walker. Currently EnumLinks is an async walker (calls makeAsyncWalkTrieGetLinks) and ForEachLink is synchronous (calls walkTrie) -- but I don't know if ForEachLink is called by anything at this point. The other difference is they the synchronous walk is in order traversal while async makes no such gaurantees.
hamt/hamt.go
Outdated
ds.children[i] = c | ||
} | ||
|
||
// Link returns a merklelink to this shard node | ||
func (ds *Shard) Link() (*ipld.Link, error) { | ||
nodeType := ds.nodeType() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link
is only called inside the Node
method (hiding an implicit recursion) so we can just move this logic there and remove this function. However, if you think that it will generate too much confusion at this point in the PR we can iterate on it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schomatis If we move this Link()
inside the Node()
, the Node()
function may be too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we can leave this for another PR.
ae9cb5f
to
9fdf744
Compare
9fdf744
to
2d832ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, certainly an improvement.
@@ -471,20 +480,22 @@ func makeAsyncTrieGetLinks(dagService ipld.DAGService, onShardValue func(*shardV | |||
} | |||
} | |||
|
|||
func (ds *Shard) walkTrie(ctx context.Context, cb func(*shardValue) error) error { | |||
func (ds *Shard) walkTrie(ctx context.Context, cb func(*Shard) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my belief is getValue is a search -- it returns a single value (through a callback which this seems odd given you only want a single value), making a decision at every layer of the tree based on the file's content hash-- while walkTrie is an iteration (it traverses every node) -- so they're not the same exactly. But of course these are both the two functions implemented by your IPLD walker. So they could potentially both be replaced easily with your IPLD walker.
Also though, I am not exactly sure what context walkTrie will really get called instead of the async walker. Currently EnumLinks is an async walker (calls makeAsyncWalkTrieGetLinks) and ForEachLink is synchronous (calls walkTrie) -- but I don't know if ForEachLink is called by anything at this point. The other difference is they the synchronous walk is in order traversal while async makes no such gaurantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo those two nits).
hamt/hamt.go
Outdated
@@ -39,13 +39,20 @@ const ( | |||
HashMurmur3 uint64 = 0x22 | |||
) | |||
|
|||
func (ds *Shard) isValueNode() bool { | |||
if ds.key != "" && ds.val != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can just return this comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schomatis fixed.
hamt/hamt.go
Outdated
} | ||
func (ds *Shard) makeShardValue(lnk *ipld.Link) *Shard { | ||
lnk2 := *lnk | ||
s, _ := makeShard(ds.dserv, ds.tableSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check for errors just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schomatis
Did you mean that we should add panic
here?
IMO, I think the err
always will be nil
, because the ds.tableSize
is a correct size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an error check and then have makeShardValue be able to return an error, then add error checks on the two places it's called, which are both in functions which already pass on errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hannahhoward yeah, a good advise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hannahhoward @schomatis fixed
@schomatis are your changes blocking? (i.e. can we merge and fix afterward or you want to make sure @overbool fixes them first?) |
Really nice work! |
fixes #28