Skip to content
This repository was archived by the owner on Jun 27, 2023. It is now read-only.

refactor(hamt): remove child interface from hamt package #30

Merged
merged 3 commits into from
Oct 16, 2018

Conversation

overbool
Copy link
Contributor

@overbool overbool commented Oct 9, 2018

fixes #28

@overbool overbool requested a review from schomatis as a code owner October 9, 2018 07:52
@overbool overbool force-pushed the refactor/remove-child-from-hamt branch from 20e5cd5 to ae9cb5f Compare October 9, 2018 12:29
Copy link
Contributor

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!! This is certainly looking much nicer. Just seeing the diversity in how we were sometimes referring to childs and others to shardValules to actually mean the same thing (now unified as Shard) is a good indication I think that we're in the right path.

Left some notes, don't take them as strong directives but rather as suggestions to think about. I still don't have a clear picture of what the final state of this package should look like and it's something that will need some more thought and discussion, some of my comments may be way off so feel free to point out whatever doesn't sound good to you.

hamt/hamt.go Outdated
shardValueNode
)

func (ds *Shard) nodeType() nodeType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like encapsulating this logic, ideally we are unifying both types to make their manipulation as transparent as possible so if we do indeed need this function it should be called as little as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schomatis I have a small thought about this: Shard only have 2 type nodes: Shard node and Value node, so maybe we can only use the follow function:

func (ds *Shard) isValueNode() bool {
	if ds.key != "" && ds.val != nil {
		return true
	}
	return false
}

And then we can replace switch logic with if ... else.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeeees!! Definitely that is conceptually much clearer, since we only care about knowing where does our directory stop and the rest of the DAG hierarchy continues, that makes much sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern with the "node" terminology here is that, although conceptually from the UnixFS layer perspective this is another (separate) node (a leaf node in the trie), actually it's being encoded inside its parent (internal) node in the DAG layer (i.e., that node is not stored separately in the DAG service). But I guess that is a correct abstraction, this collapse at the DAG level (for performance reasons) doesn't need to be complicating the logic here.

hamt/hamt.go Outdated
lnk2 := *lnk
return &shardValue{
return &Shard{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to call makeShard first to properly initialize the Shard structure and later load its key/value pair.

hamt/hamt.go Outdated
@@ -219,6 +218,10 @@ func hash(val []byte) []byte {
// Label for Shards is the empty string, this is used to differentiate them from
// value entries
func (ds *Shard) Label() string {
nodeType := ds.nodeType()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just always return the ds.key attribute here (without calling nodeType, in an effort to distantiate this logic from the shard/shard-value conflict), an internal node should have an empty string anyway.

hamt/hamt.go Outdated
ds.children = append(ds.children[:i], append([]child{sv}, ds.children[i:]...)...)
ds.nd.SetLinks(append(ds.nd.Links()[:i], append([]*ipld.Link{nil}, ds.nd.Links()[i:]...)...))
ds.children = append(ds.children[:i], append([]*Shard{sv}, ds.children[i:]...)...)
ds.nd.SetLinks(append(ds.nd.Links()[:i], append([]*ipld.Link{lnk}, ds.nd.Links()[i:]...)...))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does make sense to me but I'd prefer having the conservative approach of avoiding modifications in behavior as much as possible since this does not directly relate to the child interface. How we handle DAG links and the proto-node cache will be reviewed and refactored in another (separate) issue/PR.

@@ -380,7 +388,7 @@ func (ds *Shard) rmChild(i int) error {
return nil
}

func (ds *Shard) getValue(ctx context.Context, hv *hashBits, key string, cb func(*shardValue) error) error {
func (ds *Shard) getValue(ctx context.Context, hv *hashBits, key string, cb func(*Shard) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this seems like another great example of where we could use the Walker, generalizing the Shard as just another type of node in a DAG, where we could call its Iterate method. But, as previously suggested, this may be an unnecessary over-complication. I just don't like seeing the same recursive/switch logic over and over everywhere in the code base.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually the Search function from Walker. It's making a decision about which child to traverse next based on the files content hash.

@@ -471,20 +480,22 @@ func makeAsyncTrieGetLinks(dagService ipld.DAGService, onShardValue func(*shardV
}
}

func (ds *Shard) walkTrie(ctx context.Context, cb func(*shardValue) error) error {
func (ds *Shard) walkTrie(ctx context.Context, cb func(*Shard) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll raise a separate issue about this but we definitely need to unify this function with getValue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my belief is getValue is a search -- it returns a single value (through a callback which this seems odd given you only want a single value), making a decision at every layer of the tree based on the file's content hash-- while walkTrie is an iteration (it traverses every node) -- so they're not the same exactly. But of course these are both the two functions implemented by your IPLD walker. So they could potentially both be replaced easily with your IPLD walker.

Also though, I am not exactly sure what context walkTrie will really get called instead of the async walker. Currently EnumLinks is an async walker (calls makeAsyncWalkTrieGetLinks) and ForEachLink is synchronous (calls walkTrie) -- but I don't know if ForEachLink is called by anything at this point. The other difference is they the synchronous walk is in order traversal while async makes no such gaurantees.

hamt/hamt.go Outdated
ds.children[i] = c
}

// Link returns a merklelink to this shard node
func (ds *Shard) Link() (*ipld.Link, error) {
nodeType := ds.nodeType()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link is only called inside the Node method (hiding an implicit recursion) so we can just move this logic there and remove this function. However, if you think that it will generate too much confusion at this point in the PR we can iterate on it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schomatis If we move this Link() inside the Node(), the Node() function may be too long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can leave this for another PR.

@overbool overbool force-pushed the refactor/remove-child-from-hamt branch from ae9cb5f to 9fdf744 Compare October 10, 2018 07:01
@overbool overbool force-pushed the refactor/remove-child-from-hamt branch from 9fdf744 to 2d832ee Compare October 10, 2018 07:55
@overbool overbool changed the title [WIP] refactor(hamt): remove child interface from hamt package refactor(hamt): remove child interface from hamt package Oct 10, 2018
Copy link
Contributor

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, certainly an improvement.

@@ -471,20 +480,22 @@ func makeAsyncTrieGetLinks(dagService ipld.DAGService, onShardValue func(*shardV
}
}

func (ds *Shard) walkTrie(ctx context.Context, cb func(*shardValue) error) error {
func (ds *Shard) walkTrie(ctx context.Context, cb func(*Shard) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my belief is getValue is a search -- it returns a single value (through a callback which this seems odd given you only want a single value), making a decision at every layer of the tree based on the file's content hash-- while walkTrie is an iteration (it traverses every node) -- so they're not the same exactly. But of course these are both the two functions implemented by your IPLD walker. So they could potentially both be replaced easily with your IPLD walker.

Also though, I am not exactly sure what context walkTrie will really get called instead of the async walker. Currently EnumLinks is an async walker (calls makeAsyncWalkTrieGetLinks) and ForEachLink is synchronous (calls walkTrie) -- but I don't know if ForEachLink is called by anything at this point. The other difference is they the synchronous walk is in order traversal while async makes no such gaurantees.

Copy link
Contributor

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (modulo those two nits).

hamt/hamt.go Outdated
@@ -39,13 +39,20 @@ const (
HashMurmur3 uint64 = 0x22
)

func (ds *Shard) isValueNode() bool {
if ds.key != "" && ds.val != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can just return this comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schomatis fixed.

hamt/hamt.go Outdated
}
func (ds *Shard) makeShardValue(lnk *ipld.Link) *Shard {
lnk2 := *lnk
s, _ := makeShard(ds.dserv, ds.tableSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check for errors just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schomatis
Did you mean that we should add panic here?
IMO, I think the err always will be nil , because the ds.tableSize is a correct size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an error check and then have makeShardValue be able to return an error, then add error checks on the two places it's called, which are both in functions which already pass on errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hannahhoward yeah, a good advise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hannahhoward
Copy link
Contributor

@schomatis are your changes blocking? (i.e. can we merge and fix afterward or you want to make sure @overbool fixes them first?)

@schomatis schomatis merged commit 3cc73ee into ipfs:master Oct 16, 2018
@schomatis
Copy link
Contributor

Really nice work!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove the child interface from the hamt package
3 participants