-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slow parallel processing #37
Comments
@tobowers Thanks for reporting! Getting usage on that PR definitely helps us move the ball forward |
So... this could totally be my own shitty parallel code causing context switching... so I really hope I'm not leading us down a rabbit hole :). However, anyway: using this code https://github.com/QuorumControl/encodeexplore I do not see an overall speed improvement using the new rfmt-pool branch: ( https://github.com/QuorumControl/encodeexplore/compare/test/refmt-pool ) I'm still seeing spikes in decode/encode... these lines: https://github.com/QuorumControl/encodeexplore/blob/master/gossip/gossip.go#L234-L239 producing:
|
FYI: These encodes/decodes are getting called 10s of thousands of times. |
So, WrapObject is CPU heavy as it computes a sha256 hash. We could do that lazily but we worried that we'd always want it anyways so we might as well compute it up-front. However, it shouldn't take a second. I'd try using pprof: https://blog.golang.org/profiling-go-programs. A CPU profile should give you a good idea of what's happening. |
When I was looking before it wasn't the sha256, but instead the rfmt decode. I will check again with the profile. Thanks for the link! |
Thanks. The profile will be really helpful. |
Interesting:
|
|
Could you post a graph? Try running the |
It looks like we really should consider traversing lazily. |
without the noise from sprintf: https://github.com/QuorumControl/encodeexplore/blob/test/refmt-pool/profile002.svg |
Yeah. Calculating the tree up-front is slowing us down. There's no reason we really need to do that. |
@Stebalien the reason we do it up front is to make sure that the other methods that rely on it having been computed don't need to return an error |
I see. The performance issue is concatenating the strings so we could always do that later. However, I think a more complete fix is to make two node interfaces (a lazy one for resolving, computing the cid, deserialising into an object and an eager one for introspecting). |
In that 2nd one, the concatenation doesn't seem to take up as much time (I just switched the fmt.Sprintf to a standard string concatenation and a strconv for the integer.) |
I guess calling that "the" performance issue is incorrect. However, it still takes up 10% of the time. |
Yeah, Thats what @warpfork was saying too. It sounds like a good idea to me |
This is not related to parallel processing, but since I've got the issue open, and it's about performance... I've been nudging around some benchmarks of go-ipld-cbor and one thing that's come up as a big timesink on some of my pprof graphs is the handling of CIDs. CIDs use the refmt 'transform' function feature... and that's one of those features which is powerful and expressive but by nature does some things that are not very friendly to the GC. That's something I'm going to look at trying to optimize better. In general GC can be a big factor of performance, and also distressingly hard to figure out the impacts of because it can quietly make other code seem slower. The graph view and the weights of the arrows heading into the GC nodes are probably the most informative; I don't really know of a great way to get a textual list of that from the other pprof commands (although if anyone has suggetions... ❤️ ) |
Any IPLD folks have thoughts on this @warpfork? |
I think we could probably mark this closed. Some performance observations were made here, but I'm not sure what actions would be described next by this issue. (More efforts to reduce allocations and reduce GC pressure, maybe? But how we would actually do that probably needs other discussions than this.) We've also done an enormous amount of work in the redesigned go-ipld-prime interfaces to be sensitive to GC pressure and design to avoid unnecessary allocations at all levels. (In essence, that library also avoids all the precomputations of tree structure that seems to have been flagged as problematic here(?) -- and we needed to tackle that with a new library and new interfaces, because there was really no incremental way to change something as central as that.) I'd suggest looking to that library for improvements now and in the future. |
Hi, I'm still researching this so it's a work-in-progress but I have narrowed it down enough that I thought it worthy of discussion. I've been using #30 in a pre-production project and I noticed some bottlenecks in parallel code. I started to research and I narrowed it down to WrapObject creating blocking-like behavior.
In a real-world project with hundreds of parallel tasks, I was noticing 200-500ms times in wrapping simple objects.
I narrowed it down to https://github.com/ipfs/go-ipld-cbor/blob/refmt/node.go#L184 but haven't yet found the root cause.
The rest of that function does not cause long blocking behavior.
Here's a failing test:
The text was updated successfully, but these errors were encountered: