-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-Path Payments #441
Multi-Path Payments #441
Conversation
b2f320b
to
d5e374a
Compare
d5e374a
to
b1b15db
Compare
b1b15db
to
032f537
Compare
032f537
to
72f647a
Compare
Rebase plz? |
72f647a
to
c9a1e83
Compare
Rebased without changes. |
Codecov Report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit review TODOs:
- Time out AwaitingRemoteRAA outgoing HTLCs when we reach cltv_expiry
- Split only-receive/forward data out of PendingHTLCInfo into an enum
- Support (de)serializing payment_data in onion TLVs and track them
- Refuse to deserialize OnionHopDatas with values > 21 million
- Impl Base AMP in the receive pipeline and expose payment_secret
- Expand the Route object to include multiple paths.
- Implement multipath sends using payment_secret.
- Refactor test utils and add a simple MPP send/claim test.
- Time out incoming HTLCs when we reach cltv_expiry (+ test)
- Test basic AMP payments in chanmon_consistency
- Add/announce features for payment_secret and basic_mpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e99fc10 also looks good to me, just nitting some names :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass 8e8c6e7
c9a1e83
to
48e8ef7
Compare
Rebased with all pending comments now that #551 went in. |
48e8ef7
to
7f93773
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ee0a406 LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass e23ff0a
for htlc in htlcs.iter() { | ||
total_value += htlc.value; | ||
if htlc.payment_data.as_ref().unwrap().total_msat != data.total_msat { | ||
total_value = msgs::MAX_VALUE_MSAT; | ||
} | ||
if total_value >= msgs::MAX_VALUE_MSAT { break; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments here (and generally throughout this chunk)?
Are all the htlcs in htlcs
always part of the same multi-path payment here, so we're checking that they all agree on what the total paid amount should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. If we have a payment_data (implying we also have a payment_secret), then we disambiguate payments by the payment_secret instead of by individual HTLCs.
amt: total_value, | ||
}); | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the case where it's not a multipath payment, just a regular receive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
None => self.fail_htlc_backwards_internal(self.channel_state.lock().unwrap(), htlc_source, &payment_hash, HTLCFailReason::Reason { failure_code, data: Vec::new() }), | ||
Some(chan_update) => self.fail_htlc_backwards_internal(self.channel_state.lock().unwrap(), htlc_source, &payment_hash, HTLCFailReason::Reason { failure_code, data: chan_update.encode_with_len() }), | ||
}; | ||
for (htlc_source, payment_hash, failure_reason) in failed_forwards.drain(..) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice how it's cleaner with just the HTLCFailReason
enum 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fd46193 LGTM
lightning/src/ln/channelmanager.rs
Outdated
assert!(!sources.is_empty()); | ||
let passes_value = if let &Some(ref data) = &sources[0].payment_data { | ||
assert!(payment_secret.is_some()); | ||
if data.total_msat == expected_amount { true } else { false } | ||
} else { | ||
assert!(payment_secret.is_none()); | ||
false | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to check understanding -- we fail HTLCs without a payment_secret because it's a required feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thats a big confusing - if this bool is set to false in this block it triggers the classic check later. I moved the old check into this block to make it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, well it (obviously) doesn't build if I pull the later check up into this block lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
33ac6c8 lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass e0f3a03
Looks pretty good! Correct me if I'm wrong, but I didn't realize base AMP was just sending a bunch of individual HTLCs that add up to the desired total, plus or minus a few details to ensure everyone agrees on the full amount, and a payment secret...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass e1000b2
pub fn claim_payment_along_route_with_secret<'a, 'b, 'c>(origin_node: &Node<'a, 'b, 'c>, expected_route: &[&Node<'a, 'b, 'c>], skip_last: bool, our_payment_preimage: PaymentPreimage, our_payment_secret: Option<[u8; 32]>, expected_amount: u64) { | ||
assert!(expected_route.last().unwrap().node.claim_funds(our_payment_preimage, &our_payment_secret, expected_amount)); | ||
check_added_monitors!(expected_route.last().unwrap(), 1); | ||
pub fn claim_payment_along_route_with_secret<'a, 'b, 'c>(origin_node: &Node<'a, 'b, 'c>, expected_paths: &[&[&Node<'a, 'b, 'c>]], skip_last: bool, our_payment_preimage: PaymentPreimage, our_payment_secret: Option<[u8; 32]>, expected_amount: u64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is skip_last
for cases like the test for #549, where you want to have custom fail-back behavior/half-completed payments? Could use some docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, for tests that need to do something special to the last node. Generally, a lot of our tests could use more docs, but mostly its understandable looking at the callsite (even if the sub-functions themselves are nonsense).
lightning/src/ln/functional_tests.rs
Outdated
assert_eq!(nodes[3].node.claim_funds(payment_preimage, &None, 200_000), false); | ||
assert_eq!(nodes[3].node.claim_funds(payment_preimage, &Some([42; 32]), 200_000), false); | ||
// ...but with the right secret we should be able to claim all the way back | ||
claim_payment_along_route_with_secret(&nodes[0], &[&[&nodes[1], &nodes[3]], &[&nodes[2], &nodes[3]]], false, payment_preimage, Some(payment_secret.0), 200_000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to add tests for partial failure then follow-up retries, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we don't have any way to do follow-up retries (nor is there a way to do so). We just have to wait for the final node to time-out the payment, get a PaymentFailed, then retry the whole thing. I'm not sure exactly what you're looking for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant test a send_payment
attempt that results in a PaymentSendFailure::PartialFailure
, and then retry the paths that are returned as failed-but-safe-to-retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. There's currently no way to retry said paths (sadly), unless they were monitor failures. I'll open an issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d7c8df3
to
dccc84e
Compare
Addressed all the pending coments, and added a number of fixup commits in between (no other changes to the original commits). |
609eb31
to
f06145c
Compare
Squashed the fixup commits with no changes to make things reviewable again. Can continue discussion on #441 (comment) but other than coming to a conclusion there I think all comments have been addressed. |
@@ -1007,13 +1018,19 @@ impl<ChanSigner: ChannelKeys, M: Deref, T: Deref, K: Deref, F: Deref> ChannelMan | |||
return_err!("Upstream node set CLTV to the wrong value", 18, &byte_utils::be32_to_array(msg.cltv_expiry)); | |||
} | |||
|
|||
let payment_data = match next_hop_data.format { | |||
msgs::OnionHopDataFormat::Legacy { .. } => None, | |||
msgs::OnionHopDataFormat::NonFinalNode { .. } => return_err!("Got non final data with an HMAC of 0", 0x4000 | 22, &[0;0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Would this error be more appropriate?
type: BADONION|PERM|5 (invalid_onion_hmac)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, no, we already checked the HMAC for us, this HMAC is the HMAC we're supposed to send the next party in the route (hence 0 for "no more sending").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, the onion decrypts as expected for a final node, the content is wrong (even if it's with the regard to the HMAC context) so invalid_onion_payload
seems to suit. I agree that this error definition is quite loose.
fuzz/src/full_stack.rs
Outdated
@@ -401,7 +401,7 @@ pub fn do_test(data: &[u8], logger: &Arc<dyn Logger>) { | |||
sha.input(&payment_hash.0[..]); | |||
payment_hash.0 = Sha256::from_engine(sha).into_inner(); | |||
payments_sent += 1; | |||
match channelmanager.send_payment(route, payment_hash) { | |||
match channelmanager.send_payment(route, payment_hash, &None) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to, say, 50% of the time, have a non-None payment_secret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, likely, I'm not too worried about payment_secret coverage, there aren't that many ways it can go wrong, and we hit a number of them. As we write new tests, including payment_secrets in them is likely a good idea, but going to each test and setting payment_secrets requires some writing/reviewing effort to make sure it all lines up right.
htlc_id: htlc.prev_hop.htlc_id, | ||
incoming_packet_shared_secret: htlc.prev_hop.incoming_packet_shared_secret, | ||
}), payment_hash, | ||
HTLCFailReason::Reason { failure_code: 0x4000 | 15, data: byte_utils::be64_to_array(htlc.value).to_vec() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this error is missing the height
parameter?
type: PERM|15 (incorrect_or_unknown_payment_details)
data:
[u64:htlc_msat]
[u32:height]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec changed, so, yes, but also we can do that in a followup PR.
// HTLC when the monitor updating is restored (or on chain). | ||
log_error!(self, "Temporary failure claiming HTLC, treating as success: {}", e.1.err.err); | ||
claimed_any_htlcs = true; | ||
} else { errs.push(e); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding -- if we hit this else
case and we've already claimed some funds, we will have received a partial payment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, but kinda - we've updated our local states with the payment preimage, so if we need to hit the chain we'll enforce the claim on-chain (and if we run out of time to claim it we'll auto-force-close the channel). Not every copy of our channel monitor may have been updated, but there should always be a local copy which was.
} | ||
|
||
#[test] | ||
fn test_simple_mpp() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a case for the partial failure/total failure cases? + a case for when 1 path fails with a monitor update failure, but then succeeds later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added one in #587.
@@ -622,6 +622,8 @@ mod fuzzy_internal_msgs { | |||
#[derive(Clone)] | |||
pub(crate) struct FinalOnionHopData { | |||
pub(crate) payment_secret: PaymentSecret, | |||
/// The total value, in msat, of the payment as received by the ultimate recipient. | |||
/// Message serialization may panic if this value is more than 21 million Bitcoin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/may panic/will panic and below as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally prefer to avoid committing to that. While we currently panic, I don't want a user relying on that specific behavior.
@@ -1007,13 +1018,19 @@ impl<ChanSigner: ChannelKeys, M: Deref, T: Deref, K: Deref, F: Deref> ChannelMan | |||
return_err!("Upstream node set CLTV to the wrong value", 18, &byte_utils::be32_to_array(msg.cltv_expiry)); | |||
} | |||
|
|||
let payment_data = match next_hop_data.format { | |||
msgs::OnionHopDataFormat::Legacy { .. } => None, | |||
msgs::OnionHopDataFormat::NonFinalNode { .. } => return_err!("Got non final data with an HMAC of 0", 0x4000 | 22, &[0;0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, the onion decrypts as expected for a final node, the content is wrong (even if it's with the regard to the HMAC context) so invalid_onion_payload
seems to suit. I agree that this error definition is quite loose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall good, but there is still few points of concern
lightning/src/ln/channelmanager.rs
Outdated
/// If a payment_secret *is* provided, we assume that the invoice had the payment_secret feature | ||
/// bit set (either as required or as available). If multiple paths are present in the Route, | ||
/// we assume the invoice had the basic_mpp feature set. | ||
pub fn send_payment(&self, route: Route, payment_hash: PaymentHash, payment_secret: &Option<PaymentSecret>) -> Result<(), APIError> { | ||
if route.hops.len() < 1 || route.hops.len() > 20 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we enforce that sha256(payment_secret) != payment_hash ? It should be cheap compare to onion computation, that would protect a bit more buggy applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to protect users against every possible stupid they can do. An extra sha256 to protect a user who didn't bother to read the docs is...hopefully not worth it IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay fair point
} | ||
if route.paths[0].len() < 1 || route.paths[0].len() > 20 { | ||
return Err(APIError::RouteError{err: "Path didn't go anywhere/had bogus size"}); | ||
if route.paths.len() > 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way what's the rational behind 10? Maybe add a comment pointing to the limiting factor to let people know what need to get done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, there really isn't a reason, I just figured we should have some reasonable-ish limit. I'll add a comment noting that its completely arbitrary, but maybe we should change it after we support retrying individual paths.
if !chan.get().is_live() { | ||
check_res_push!(Err(APIError::ChannelUnavailable{err: "Peer for first hop currently disconnected/pending monitor update!"})); | ||
} | ||
break_chan_entry!(self, chan.get_mut().send_htlc_and_commit(htlc_msat, payment_hash.clone(), htlc_cltv, HTLCSource::OutboundRoute { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm I think that's something you're aware of but if we have HTLC shards going through the same link we're going to throw them in holding_cell_htlc_updates, beyond first one.
In the future, we should rework Channel to send multiple HTLCs - one commitment_signed, and avoid a latency hit, assuming we do some kind of common expression reduction on our set of RouteHop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we should be using forward_htlcs for this - its already set up to send HTLCs in batches on a timer, which is what we really want (especially since it means we'll send our own HTLCs via the same codepath/at the same time as forwarded ones, which is great for privacy). See-also #583.
let payment_secret_opt = | ||
if let &Some(ref data) = &payment_data { Some(data.payment_secret.clone()) } else { None }; | ||
let htlcs = channel_state.claimable_htlcs.entry((payment_hash, payment_secret_opt)) | ||
.or_insert(Vec::new()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I already made the point here #441 (comment) (but it was blurred with the non-practical-wacky-secret-brute-force) that when we receive a HTLC shard for which payment_secret
is junk that's a DoS concern.
As far as I understand, a forwarding node can send us another same-hash-shard-HTLC and we're not going to pass it upstream for processing and potentially cancellation. A payee can already do this by sending micro-HTLCs to bottleneck the channel but at least that's someone who receives an invoice. We should be aware of this unsafety until we implement a timeout.
A complementary channel policy could be also to limit HTLC fragmentation per-MPP to avoid malicious payees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but sadly in general these types of issues are no more of a DoS concern than them just forwarding an HTLC through us and holding it open. In this specific case, though, I think we will send any such HTLC upstream as a PaymentReceived event. Certainly not doing so would be a privacy issue (as any handling of "same-hash-different-secret" HTLCs differently from "unique-hash" is inherently a privacy concern). Because we index on (hash, secret), not just hash, I dont see how, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right not handling them uniquely would be a privacy leak, at least cancelling without any timeout. As you pointed out, it falls under the bigger issue of free pending HTLCs on channel links, should be revisited latter if we get upfront_payment or any other DoS prevention measure.
/// Until then, however, values of None should be ignored, and only incorrect Some values | ||
/// should result in an HTLC fail_backwards. | ||
/// Note that, in any case, this value must be passed as-is to any fail or claim calls as | ||
/// the HTLC index includes this value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Ultimately, hash shouldn't identify payment, they're only a poorman atomic multi-hop locks. Due to their publicity across payment path, payment_secret
should be preferred to track payment inside an application logic". Or we should discourage people to use payment_hash internally somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, eventually, yea. But as the docs indicate here, until sending nodes have upgraded, you kinda have to ignore any unexpected-Nones, which implies indexing by PaymentHash, still :/
if let msgs::ErrorAction::IgnoreError = e.1.err.action { | ||
// We got a temporary failure updating monitor, but will claim the | ||
// HTLC when the monitor updating is restored (or on chain). | ||
log_error!(self, "Temporary failure claiming HTLC, treating as success: {}", e.1.err.err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not succeed solving monitor issue on this local link before CLTV expires for this HTLC, but still reveal preimages. It means now if there is some forwarding node intersection, all downstream payments may be claimed without final node being fully-paid. It may lead to miscommunications between payer-payee, because an internal monitor error may take to a payment not being a Success or a Failure but to some state in-between.... ? (and I think that's new)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Likely the first place where we may give the preimage back to the sender before either committing to it or seeing it confirmed on-chain. That sucks, but also not a ton we can do about it - we'd have to wait until every channel is in a usable state (and not awaiting remote RAA) before passing preimages back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really not great, thinking more about it, claiming a MPP means it should be atomic across all ChannelMonitor concerned. I think we can implement this semantic by splitting get_update_fulfill_htlc at ChannelMonitorUpdateStep::PaymentPreimage, returning from here and commit preimage on all ChannelMonitors. If we don't have any error, we can keep moving forward updating offchain for all channels.
If any channel fails to move forward and gets a new local_commitment tx without fulfilled HTLC, its ChannelMonitor should be able to claim inbound.
I think that's a bit of plumbery so I don't propose to do it now, but we should keep track of it with a TODO or an issue and I volunteer to implement that. Because this is not going to be solved by Base AMP and that's a pretty nasty case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, you still have a risk, because a failing Monitor may be due to a) a host crash or something like this or b) a network partition if it's a remote one. If it's a network partition and in case of cancelling a MPP for a failure elsewhere, the isolated, non-updated Monitor will act on its own at timeout expiration and claim HTLC onchain.
But in the meanwhile, you may have cancel offchain all other shards so even if preimage is revealed, that's less a concern, state is fucked up only for one-shard. I think that's slightly better, because we don't lose funds ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'm not suggesting its ideal, but mostly that there isn't a lot we can do without overhauling our channel state machine so that we do a two-phase update + commit. We can't really update a channel's monitor without also sending back the update_fulfill since the channelmonitor will force-close the channel if the HTLC is getting close to expiry, and we can't find out if we can update all the monitors without doing so iteratively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started writing up a long issue about this to track it, but I'm not sure it really makes sense - for our own payments we already have essentially the same thing, just without the non-atomicity. We give up the preimage to a peer but they could go offline immediately after having received the update_fulfill_htlc message before responding, making us go on chain and try to broadcast an HTLC-Success transaction to get our funds. The only new issue is that we may be doing so for part of our payment.
Things are different for relayed payments, but they remain so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we may already go onchain, same as for normal payment. Thinking more about this, I see this as an issue on the payer-payee interaction level, where you may either a) not get full-payment due to one ChannelMonitor failure but some on-path node pulling all HTLCs from payer due to uniqueness of hashlocks or b) one HTLC shard being claimed on chain due to an irresponsive, un-updatable ChannelMonitor by payee and other shards being cancelled, in that case only a partial payment would have been made.
IMO, I prefer b) because at least that let funds in the hand of payee and you can still assume some communications with payer for a refund. But clearly documenting this case that a ChannelMonitor failure is an operational risk for a payee would be fine for me too.
I have a really loosely concern this may be exploited by some tower, being also a payer, and intentionally triggering a ChannelMonitorUpdateErr to avoid paying full-value goods...
Let's open an issue referring to these comments and move forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a) not get full-payment due to one ChannelMonitor failure but some on-path node pulling all HTLCs from payer due to uniqueness of hashlocks
I really don't understand this?
b) one HTLC shard being claimed on chain due to an irresponsive, un-updatable ChannelMonitor by payee and other shards being cancelled, in that case only a partial payment would have been made.
I don't think this can happen? Indeed, we may have some HTLCs contested on-chain, and we may lose some or all of those contests. But this is also the case for non-MPP payments today - we may give up the preimage and lose the on-chain contest due to fee/block space issues.
Another similar issue is us as a forwarding/routing node - we may be provided the preimage by the next-hop, but fail to update the monitor for the previous hop. This is an (IMO) well-documented case, and exactly the reason why the monitors must always have a local copy that must never fail to be updated.
db1b22e
to
ad47d96
Compare
28cf14d
to
1358976
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🥇 yay!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK 1358976
I think few comments can be better laid out, but let's move forward and fixes this in another PR.
// an HTLC to time out). This should, of course, only occur if the user is the | ||
// one doing the claiming (as it being a part of a peer claim would imply we're | ||
// about to lose funds) and only if the lock in claim_funds was dropped as a | ||
// previous HTLC was failed (thus not for an MPP payment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this new comment. I see the failure where we may try to claim a HTLC off-chain and at same time this or another HTLC is claimed on chain and would trigger a ChannelMonitorUpdateErr, but that's not what this Err path is handling ?
"as it being a part of a peer claim would imply we're about to lose funds" you mean remote timing out onchain a offered HTLC ? but in that case that's not a loss because we won't reveal preimage ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should avoid blowing up the size of the struct when we add additional data that is only relevant for receive.
This is the first step in Base AMP support, just tracking the relevant data in internal datastructures.
We should probably do this for all values (and define a newtype for msat values), but this will do for now.
Base AMP is centered around the concept of a 'payment_secret` - an opaque 32-byte random string which is used to authenticate the sender to the recipient as well as tie the various HTLCs which make up one payment together. This new field gets exposed in a number of places, though sadly only as an Option for backwards compatibility when sending to a receiver/receiving from a sender which does not support Base AMP. Sadly a huge diff here, but almost all of it is changing the method signatures for sending/receiving/failing HTLCs and the PaymentReceived event, which all now need to expose an Option<[u8; 32]> for the payment_secret. It doesn't yet properly fail back pending HTLCs when the full AMP payment is never received (which should result in accidental channel force-closures). Further, as sending AMP payments is not yet supported, the only test here is a simple single-path payment with a payment_secret in it.
Rather big diff, but its all mechanical and doesn't introduce any new features.
1358976
to
abf2891
Compare
Previously if we claimed an MPP where a previous-hop channel was closed while we were waitng for the user to provide us the preimage we'd simply skip claiming that HTLC without letting the user know. This refactors the claim logic to first check that all the channels are still available (which is actually all we need - we really mostly care about updating the channel monitors, not the channels themselves) and then claim the HTLCs in the same lock, ensuring atomicity.
This rather dramatically changes the return type of send_payment making it much clearer when resending is safe and allowing us to return a list of Results since different paths may have different return values.
Add documentation to the struct fields noting this to avoid missing docs when various msg structs become public.
ChannelManager::send_payment stopped utilizing its ownership of the Route with MPP (which, for readability, now clone()s the individual paths when creating HTLCSource::OutboundRoute objects). While this isn't ideal, it likely also makes sense to ensure that the user has access to the Route after sending to correlate individual path failures with the paths in the route or, in the future, retry individual paths. Thus, the easiest solution is to just take the Route by reference, allowing the user to retain ownership.
abf2891
to
59b1bf6
Compare
This implements Basic AMP from a receive/send side, though not yet from the router end. Only reason its draft is cause we need to support timing out MPP payments that are missing pieces for too long.