Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock in ChannelManager's handle_error!() #568

Merged

Conversation

jkczyz
Copy link
Contributor

@jkczyz jkczyz commented Apr 1, 2020

ChannelManager fails backward any pending HTLCs upon channel failure. A deadlock occurs in such cases since handle_error!() takes a locked channel_state and finish_force_close_channel() attempts to reacquire the lock. This PR adds a test to demonstrate the deadlock and fixes it by holding the lock for shorter scopes.

Fixes #549.

@jkczyz jkczyz requested a review from TheBlueMatt April 1, 2020 05:31
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, mod the one comment and the commits being "out of order" - we should first fix the issue, then add the test as otherwise we have a state in git history that fails to pass tests.

}

// Alice -> Bob -> Chuck: Route another payment but now Bob waits for Chuck's earlier revoke_and_ack.
let (_, failed_payment_hash) = route_payment(&nodes[0], &[&nodes[1]], 50_000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh lol, guess you dont need three nodes for this test. Note that this is a completely separate payment from the next one - we disambiguate by HTLCSource, not the payment_hash (as otherwise there are a number of privacy and practical funds issues). The payment_failed at the end is the indicator - its saying that a payment nodes[1] tried to send failed, not that it should fail back the HTLC to nodes[0].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Groking: what is "the next one" referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a reference to the manual send_payment call two lines down.

Ah, I see. This was my attempt at simulating nodes[2] not sending revoke_and_ack.

If I only need two nodes, then are you saying I can get rid of the nodes[0] to nodes[1] part entirely (i.e., the places where I'm using route_payment)? In that case, what is being "failed backward"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The part being "failed back" is purely the event telling us that the payment failed (as the lock in question here is taken before we decide if the HTLCSource is an OutboundRoute which we sent or a PreviousHopData which means someone else sent us an HTLC that we relayed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, simplified this by only using two nodes in the test.

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented Apr 1, 2020 via email

TheBlueMatt and others added 2 commits April 1, 2020 16:27
This partially reverts 933ae34,
though note that 933ae34 fixed a
similar deadlock while introducing this one.

If we have HTLCs to fail backwards, handle_error!() will call
finish_force_close_channel() which will attempt to lock channel_state
while it is locked at the original caller. Instead, hold the lock for
shorter scopes such that it is not held upon entering handle_error!().

Co-authored-by: Matt Corallo <git@bluematt.me>
Co-authored-by: Jeffrey Czyz <jkczyz@gmail.com>
Upon channel failure, any pending HTLCs in a channel's holding cell must
be failed backward. The added test exercises this behavior and
demonstrates a deadlock triggered within the handle_error!() macro. The
deadlock occurs when the channel_state lock is already held and then
reacquired when finish_force_close_channel() is called.
@jkczyz jkczyz force-pushed the 2020-03-handle-error-deadlock branch from 7e25d66 to 3968647 Compare April 1, 2020 23:37
@TheBlueMatt TheBlueMatt merged commit f0b037c into lightningdevkit:master Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock in handle_error!() on HTLC fail-back
2 participants