Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time out HTLCs before they expire #587

Merged
merged 7 commits into from
Apr 24, 2020

Conversation

TheBlueMatt
Copy link
Collaborator

This was broken out from the rest of #441 as @valentinewallace pointed out that it wasn't strictly necessary and #441 was big. I am tagging it 0.0.11 because it is kinda necessary, just not strictly. Namely, if we get a partial payment, but not a full one, we really need to fail back the partial bits before the channel has to go to chain for it. Based on #441, and also includes another test that makes sense in #441 but requires some of the refactors that were already in here so might as well wait.

@TheBlueMatt TheBlueMatt added this to the 0.0.11 milestone Apr 14, 2020
@TheBlueMatt TheBlueMatt mentioned this pull request Apr 14, 2020
@valentinewallace
Copy link
Contributor

Rebase plz?

@TheBlueMatt TheBlueMatt force-pushed the 2020-04-mpp-timeout branch from d3a418c to 3a3403a Compare April 15, 2020 17:25
@TheBlueMatt
Copy link
Collaborator Author

Done :).

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

pending_msg_events.push(events::MessageSendEvent::SendAnnouncementSignatures {
let res = channel.block_connected(header, height, txn_matched, indexes_of_txn_matched);
if let Ok((chan_res, mut timed_out_pending_htlcs)) = res {
timed_out_htlcs.reserve(timed_out_pending_htlcs.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this commit won't compile bc timed_out_htlcs is declared in the next commit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, every commit seems to compile fine here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, think I was looking at the commits in the wrong order (I blame github)

@@ -3680,6 +3689,49 @@ fn test_drop_messages_peer_disconnect_dual_htlc() {
claim_payment(&nodes[0], &[&nodes[1]], payment_preimage_2, 1_000_000);
}

#[test]
fn test_htlc_timeout() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a holding cell HTLC that gets failed back, to the test?

I found that case a bit weird since if an HTLC is in the holding cell when it times out, if we are the sender it seems like we could just remove it from the holding cell without failing it backwards.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm not sure which case you're asking about. This test/the previous commit is really only intended to be useful for inbound HTLCs (which can't be in the holding cell, that's just for outbound stuff or the fail/fulfill action itself) which are to us, as forwarded ones we are mainly focused on going to the chain to broadcast HTLC-Success if it matters.

Copy link
Contributor

@valentinewallace valentinewallace Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I was looking at commits out of order.

Thanks for the explanation 👍 though, I thought we still have to close a channel if a forward HTLC times out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the preimage and its outbound, yea, or if its inbound, always, yea. If we don't, its probably not worth the fee to close the channel when we aren't gonna make money for it.

@TheBlueMatt TheBlueMatt force-pushed the 2020-04-mpp-timeout branch from 3a3403a to a342d32 Compare April 17, 2020 03:55
}, onion_packet), channel_state, chan)
} {
Some((update_add, commitment_signed, monitor_update)) => {
if let Err(e) = self.monitor.update_monitor(chan.get().get_funding_txo().unwrap(), monitor_update) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea for another day, but could there be a way to push a a "test the waters" monitor update, before pushing the real one? to help help dodge these sorts of problems/confusing partial timeouts due to monitor update failure?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and no. There was a long discussion at #441 (comment) which is somewhat related, but the real issue with this API is its not really possible to implement a client for it. Imagine a hardware wallet that has to implement it - there's no way to guarantee that an update goes through in advance, even if you can send the update to the computer before you need to commit it - what if the user unplugged the device at an inopportune time?


let (payment_preimage, payment_hash) = get_payment_preimage_hash!(&nodes[0]);
let payment_secret = PaymentSecret([0xdb; 32]);
let mut route = nodes[0].router.get_route(&nodes[3].node.get_our_node_id(), None, &[], 100000, TEST_FINAL_CLTV).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth asserting what route.paths[0] is, to make sure we're not just using 2 of the same path?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass_along_path does it for us, albeit indirectly.

pending_msg_events.push(events::MessageSendEvent::SendAnnouncementSignatures {
let res = channel.block_connected(header, height, txn_matched, indexes_of_txn_matched);
if let Ok((chan_res, mut timed_out_pending_htlcs)) = res {
timed_out_htlcs.reserve(timed_out_pending_htlcs.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, think I was looking at the commits in the wrong order (I blame github)

Comment on lines 3146 to 3178
&HTLCUpdateAwaitingACK::AddHTLC { ref payment_hash, ref source, ref cltv_expiry, ref amount_msat, .. } => {
if cltv_expiry <= &height {
timed_out_htlcs.push((source.clone(), payment_hash.clone(), *amount_msat));
false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add a test for this case, to ensure holding cell HTLCs are timed out properly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Caught a few bugs along the way, of course :/.

@jkczyz jkczyz self-requested a review April 17, 2020 18:12

channel_state.claimable_htlcs.retain(|&(ref payment_hash, _), htlcs| {
htlcs.retain(|htlc| {
if height >= htlc.cltv_expiry - CLTV_CLAIM_BUFFER - LATENCY_GRACE_PERIOD_BLOCKS {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand LATENCY_GRACE_PERIOD_BLOCKS? I would have thought it should be added here if its purpose is to give our peer more time. Specifically, could you tell me if I'm reading this comment incorrectly?

https://github.com/rust-bitcoin/rust-lightning/blob/02c1925c16fca0a5c37582ec70b3fe48ba076f8f/lightning/src/ln/channelmonitor.rs#L375-L381

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if I'm reading this wrong, maybe it would be clearer if LATENCY_GRACE_PERIOD_BLOCKS was moved to the other side of the inequality:

if height + LATENCY_GRACE_PERIOD_BLOCKS >= htlc.cltv_expiry - CLTV_CLAIM_BUFFER {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claimable_htlcs isnt about peers, its about the user of RL finding the preimage and claim. Another way that may help would be to look at LATENCY_GRACE_PERIOD_BLOCKS as "the time it takes us to do a commitment update". I added a comment to this effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense then to define an appropriately named constant for CLTV_CLAIM_BUFFER + LATENCY_GRACE_PERIOD_BLOCKS?

I see these constants used together in a few places, so if there is a name that suitably captures this concept, it may be worth defining such a constant to make the code more understandable and document it there instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

Comment on lines 2327 to 2333
// Expect pending failures, but we don't bother trying to update the channel state with them.
let events = nodes[0].node.get_and_clear_pending_events();
assert_eq!(events.len(), 1);
match events[0] {
Event::PendingHTLCsForwardable { .. } => { },
_ => panic!("Unexpected event"),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment essentially states how the following code block differs from expect_pending_htlcs_forwardable (i.e., "but we don't bother trying to update the channel state with them"). However, I think the comment would be more useful if it stated why we're not updating the channel state, assuming that this is pertinent to the test.

Also, given that this block of code and preceding comment is repeated a few times, I'd recommend making an appropriately named utility function for it, especially since it is almost identical to expect_pending_htlcs_forwardable. Then a comment might be unnecessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure what to say aside from "the test doesn't need to do the forwards". I converted it to a macro with "_ignore" so its a bit more DRY, thgouh.

Comment on lines -1302 to -1307
'path_loop: for path in route.paths.iter() {
macro_rules! check_res_push {
($res: expr) => { match $res {
Ok(r) => r,
Err(e) => {
results.push(Err(e));
continue 'path_loop;
},
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see a helper method added rather than using this macro! 😀

@@ -1227,6 +1227,80 @@ impl<ChanSigner: ChannelKeys, M: Deref, T: Deref, K: Deref, F: Deref> ChannelMan
})
}

// Only public for testing, this should otherwise never be called direcly
pub(crate) fn send_payment_path(&self, path: &Vec<RouteHop>, payment_hash: &PaymentHash, payment_secret: &Option<PaymentSecret>, total_value: u64, cur_height: u32) -> Result<(), APIError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To others, I found this command helpful for reviewing since the diff contains mostly moved code:

git diff --color-moved=dimmed-zebra --color-moved-ws=allow-indentation-change 9224f01^ 9224f01 lightning/src/ln/channelmanager.rs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ this is what i was missing 😄

@TheBlueMatt TheBlueMatt force-pushed the 2020-04-mpp-timeout branch 3 times, most recently from e799bf6 to 664987b Compare April 20, 2020 23:10
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding 9501d64! Much easier to follow the test when it can be broken up into more digestible parts. 😀


channel_state.claimable_htlcs.retain(|&(ref payment_hash, _), htlcs| {
htlcs.retain(|htlc| {
if height >= htlc.cltv_expiry - CLTV_CLAIM_BUFFER - LATENCY_GRACE_PERIOD_BLOCKS {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense then to define an appropriately named constant for CLTV_CLAIM_BUFFER + LATENCY_GRACE_PERIOD_BLOCKS?

I see these constants used together in a few places, so if there is a name that suitably captures this concept, it may be worth defining such a constant to make the code more understandable and document it there instead.


channel_state.claimable_htlcs.retain(|&(ref payment_hash, _), htlcs| {
htlcs.retain(|htlc| {
if height >= htlc.cltv_expiry - CLTV_CLAIM_BUFFER - LATENCY_GRACE_PERIOD_BLOCKS {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a check/panic in case the height is much higher than we expect, and we should be closing the channel rather than failing the HTLC backwards?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we could maybe be more strict about things like "the user is connecting a block at height 20 but we thought we were at height 10", but, in general, there's no reason to fail a channel to the chain just because we Definitely Cannot Claim such an HTLC.

commitment_signed_dance!(nodes[0], nodes[1], htlc_timeout_updates.commitment_signed, false);
let events = nodes[0].node.get_and_clear_pending_events();
match events[0] {
Event::PaymentFailed { payment_hash, rejected_by_dest, error_code } => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting this compilation error on this commit: https://i.imgur.com/dXJErz0.png

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved by the later commit: "f rebase on ..."

@@ -3066,7 +3066,12 @@ impl<ChanSigner: ChannelKeys, M: Deref + Sync + Send, T: Deref + Sync + Send, K:
// number of blocks we generally consider it to take to do a commitment update,
// just give up on it and fail he HTLC.
if height >= htlc.cltv_expiry - CLTV_CLAIM_BUFFER - LATENCY_GRACE_PERIOD_BLOCKS {
timed_out_htlcs.push((HTLCSource::PreviousHopData(htlc.prev_hop.clone()), payment_hash.clone(), htlc.value));
let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec();
htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array(height));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch in this commit

@@ -769,13 +769,19 @@ macro_rules! expect_payment_sent {
}

macro_rules! expect_payment_failed {
($node: expr, $expected_payment_hash: expr, $rejected_by_dest: expr) => {
($node: expr, $expected_payment_hash: expr, $rejected_by_dest: expr $(, $expected_error_code: expr, $expected_error_data: expr)*) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just always check the error code/data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because that requires going back and writing in a new error check at every callsite. It could be done in a followup but its quite a bit of work.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finished going through the commits :)

// Pass the first HTLC of the payment along to nodes[3].
let mut events = nodes[0].node.get_and_clear_pending_msg_events();
assert_eq!(events.len(), 1);
pass_along_path(&nodes[0], &[&nodes[1], &nodes[3]], 0, payment_hash.clone(), Some(payment_secret), events.pop().unwrap(), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could check that we aren't able to claim before the monitor gets successfully updated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass_along_path checks for us that we don't get the PaymentReceived event (the last false/true bool).

let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec();
htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array(height));
timed_out_htlcs.push((HTLCSource::PreviousHopData(htlc.prev_hop.clone()), payment_hash.clone(), HTLCFailReason::Reason {
failure_code: 0x4000 | 15,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this failure code is "missing payment details" but above it's "expiry too soon"? expiry too soon makes more sense to me in both situations.. could be wrong but this just doesn't feel like the intended purpose of the "missing payment details" error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec for incorrect_or_unknown_payment_details says "or the CLTV expiry of the htlc is too close to the current block height for safe handling." :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touché

}));
}
if let Some(funding_locked) = chan_res {
pending_msg_events.push(events::MessageSendEvent::SendFundingLocked {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could add a log here as well similar to the one just below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm? I'm not sure which you're referring to (the log line three down is sent in either if case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, missed that

@TheBlueMatt TheBlueMatt force-pushed the 2020-04-mpp-timeout branch from 664987b to edb93e2 Compare April 24, 2020 02:24
@TheBlueMatt
Copy link
Collaborator Author

Addressed all the outstanding feedback, I think.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@TheBlueMatt TheBlueMatt force-pushed the 2020-04-mpp-timeout branch from edb93e2 to 05904e3 Compare April 24, 2020 17:59
@jkczyz
Copy link
Contributor

jkczyz commented Apr 24, 2020

Great explanation of HTLC_FAIL_BACK_BUFFER in 23bcfaa!

TheBlueMatt and others added 7 commits April 24, 2020 14:28
We only do this for incoming HTLCs directly as we rely on channel
closure and HTLC-Timeout broadcast to fail any HTLCs which we
relayed onwards where our next-hop doesn't update_fail in time.
expect_payment_failed!() was introduced after many of the tests
which could use it were written, so we take this opportunity to
switch them over now, increasing test coverage slightly by always
checking the payment hash expected.
This is a key test for our automatic HTLC time-out logic, as it
ensures we don't allow an HTLC which indicates we should wait for
additional HTLCs before responding to cause us to force-close a
channel due to HTLC near-timeout.
Relatively simple test that, after a monitor update fails, we get
the right return value and continue with the bits of the MPP that
did not send after the monitor updating is restored.
In case of committing out-of-time outgoing HTLCs, we force
ourselves to close the channel to avoid remote peer claims on a
non-backed HTLC
@TheBlueMatt TheBlueMatt force-pushed the 2020-04-mpp-timeout branch from 05904e3 to d316f30 Compare April 24, 2020 18:29
@TheBlueMatt TheBlueMatt merged commit 4441827 into lightningdevkit:master Apr 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants