-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock in ChannelManager's handle_error!() #568
Fix deadlock in ChannelManager's handle_error!() #568
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, mod the one comment and the commits being "out of order" - we should first fix the issue, then add the test as otherwise we have a state in git history that fails to pass tests.
lightning/src/ln/functional_tests.rs
Outdated
} | ||
|
||
// Alice -> Bob -> Chuck: Route another payment but now Bob waits for Chuck's earlier revoke_and_ack. | ||
let (_, failed_payment_hash) = route_payment(&nodes[0], &[&nodes[1]], 50_000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh lol, guess you dont need three nodes for this test. Note that this is a completely separate payment from the next one - we disambiguate by HTLCSource, not the payment_hash (as otherwise there are a number of privacy and practical funds issues). The payment_failed at the end is the indicator - its saying that a payment nodes[1] tried to send failed, not that it should fail back the HTLC to nodes[0].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Groking: what is "the next one" referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a reference to the manual send_payment call two lines down.
Ah, I see. This was my attempt at simulating nodes[2]
not sending revoke_and_ack
.
If I only need two nodes, then are you saying I can get rid of the nodes[0]
to nodes[1]
part entirely (i.e., the places where I'm using route_payment
)? In that case, what is being "failed backward"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. The part being "failed back" is purely the event telling us that the payment failed (as the lock in question here is taken before we decide if the HTLCSource is an OutboundRoute which we sent or a PreviousHopData which means someone else sent us an HTLC that we relayed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, simplified this by only using two nodes in the test.
That was a reference to the manual send_payment call two lines down.
… On Apr 1, 2020, at 11:16, Jeffrey Czyz ***@***.***> wrote:
@jkczyz commented on this pull request.
In lightning/src/ln/functional_tests.rs:
> + let (_, payment_hash) = route_payment(&nodes[0], &[&nodes[1]], 50_000);
+ let remaining_route = nodes[1].router.get_route(&nodes[2].node.get_our_node_id(), None, &Vec::new(), 50_000, TEST_FINAL_CLTV).unwrap();
+ nodes[1].node.send_payment(remaining_route, payment_hash).unwrap();
+ check_added_monitors!(nodes[1], 1);
+
+ let payment_event = {
+ let mut events = nodes[1].node.get_and_clear_pending_msg_events();
+ assert_eq!(events.len(), 1);
+ SendEvent::from_event(events.remove(0))
+ };
+ assert_eq!(payment_event.node_id, nodes[2].node.get_our_node_id());
+ assert_eq!(payment_event.msgs.len(), 1);
+ }
+
+ // Alice -> Bob -> Chuck: Route another payment but now Bob waits for Chuck's earlier revoke_and_ack.
+ let (_, failed_payment_hash) = route_payment(&nodes[0], &[&nodes[1]], 50_000);
Groking: what is "the next one" referring to?
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
This partially reverts 933ae34, though note that 933ae34 fixed a similar deadlock while introducing this one. If we have HTLCs to fail backwards, handle_error!() will call finish_force_close_channel() which will attempt to lock channel_state while it is locked at the original caller. Instead, hold the lock for shorter scopes such that it is not held upon entering handle_error!(). Co-authored-by: Matt Corallo <git@bluematt.me> Co-authored-by: Jeffrey Czyz <jkczyz@gmail.com>
Upon channel failure, any pending HTLCs in a channel's holding cell must be failed backward. The added test exercises this behavior and demonstrates a deadlock triggered within the handle_error!() macro. The deadlock occurs when the channel_state lock is already held and then reacquired when finish_force_close_channel() is called.
7e25d66
to
3968647
Compare
ChannelManager
fails backward any pending HTLCs upon channel failure. A deadlock occurs in such cases sincehandle_error!()
takes a lockedchannel_state
andfinish_force_close_channel()
attempts to reacquire the lock. This PR adds a test to demonstrate the deadlock and fixes it by holding the lock for shorter scopes.Fixes #549.