Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Path Payments #441

Merged
merged 12 commits into from
Apr 15, 2020
Merged

Conversation

TheBlueMatt
Copy link
Collaborator

This implements Basic AMP from a receive/send side, though not yet from the router end. Only reason its draft is cause we need to support timing out MPP payments that are missing pieces for too long.

@valentinewallace
Copy link
Contributor

Rebase plz?

@TheBlueMatt
Copy link
Collaborator Author

Rebased without changes.

@codecov
Copy link

codecov bot commented Mar 11, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@e8a8fd0). Click here to learn what that means.
The diff coverage is n/a.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit review TODOs:

  • Time out AwaitingRemoteRAA outgoing HTLCs when we reach cltv_expiry
  • Split only-receive/forward data out of PendingHTLCInfo into an enum
  • Support (de)serializing payment_data in onion TLVs and track them
  • Refuse to deserialize OnionHopDatas with values > 21 million
  • Impl Base AMP in the receive pipeline and expose payment_secret
  • Expand the Route object to include multiple paths.
  • Implement multipath sends using payment_secret.
  • Refactor test utils and add a simple MPP send/claim test.
  • Time out incoming HTLCs when we reach cltv_expiry (+ test)
  • Test basic AMP payments in chanmon_consistency
  • Add/announce features for payment_secret and basic_mpp

Sorry, something went wrong.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e99fc10 also looks good to me, just nitting some names :)

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass 8e8c6e7

@TheBlueMatt
Copy link
Collaborator Author

Rebased with all pending comments now that #551 went in.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ee0a406 LGTM

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass e23ff0a

Comment on lines +1615 to +1732
for htlc in htlcs.iter() {
total_value += htlc.value;
if htlc.payment_data.as_ref().unwrap().total_msat != data.total_msat {
total_value = msgs::MAX_VALUE_MSAT;
}
if total_value >= msgs::MAX_VALUE_MSAT { break; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments here (and generally throughout this chunk)?
Are all the htlcs in htlcs always part of the same multi-path payment here, so we're checking that they all agree on what the total paid amount should be?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. If we have a payment_data (implying we also have a payment_secret), then we disambiguate payments by the payment_secret instead of by individual HTLCs.

amt: total_value,
});
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the case where it's not a multipath payment, just a regular receive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

None => self.fail_htlc_backwards_internal(self.channel_state.lock().unwrap(), htlc_source, &payment_hash, HTLCFailReason::Reason { failure_code, data: Vec::new() }),
Some(chan_update) => self.fail_htlc_backwards_internal(self.channel_state.lock().unwrap(), htlc_source, &payment_hash, HTLCFailReason::Reason { failure_code, data: chan_update.encode_with_len() }),
};
for (htlc_source, payment_hash, failure_reason) in failed_forwards.drain(..) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice how it's cleaner with just the HTLCFailReason enum 👍

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fd46193 LGTM

Comment on lines 1842 to 1971
assert!(!sources.is_empty());
let passes_value = if let &Some(ref data) = &sources[0].payment_data {
assert!(payment_secret.is_some());
if data.total_msat == expected_amount { true } else { false }
} else {
assert!(payment_secret.is_none());
false
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to check understanding -- we fail HTLCs without a payment_secret because it's a required feature?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thats a big confusing - if this bool is set to false in this block it triggers the classic check later. I moved the old check into this block to make it clearer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, well it (obviously) doesn't build if I pull the later check up into this block lol.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

33ac6c8 lgtm

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass e0f3a03

Looks pretty good! Correct me if I'm wrong, but I didn't realize base AMP was just sending a bunch of individual HTLCs that add up to the desired total, plus or minus a few details to ensure everyone agrees on the full amount, and a payment secret...

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass e1000b2

pub fn claim_payment_along_route_with_secret<'a, 'b, 'c>(origin_node: &Node<'a, 'b, 'c>, expected_route: &[&Node<'a, 'b, 'c>], skip_last: bool, our_payment_preimage: PaymentPreimage, our_payment_secret: Option<[u8; 32]>, expected_amount: u64) {
assert!(expected_route.last().unwrap().node.claim_funds(our_payment_preimage, &our_payment_secret, expected_amount));
check_added_monitors!(expected_route.last().unwrap(), 1);
pub fn claim_payment_along_route_with_secret<'a, 'b, 'c>(origin_node: &Node<'a, 'b, 'c>, expected_paths: &[&[&Node<'a, 'b, 'c>]], skip_last: bool, our_payment_preimage: PaymentPreimage, our_payment_secret: Option<[u8; 32]>, expected_amount: u64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is skip_last for cases like the test for #549, where you want to have custom fail-back behavior/half-completed payments? Could use some docs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, for tests that need to do something special to the last node. Generally, a lot of our tests could use more docs, but mostly its understandable looking at the callsite (even if the sub-functions themselves are nonsense).

assert_eq!(nodes[3].node.claim_funds(payment_preimage, &None, 200_000), false);
assert_eq!(nodes[3].node.claim_funds(payment_preimage, &Some([42; 32]), 200_000), false);
// ...but with the right secret we should be able to claim all the way back
claim_payment_along_route_with_secret(&nodes[0], &[&[&nodes[1], &nodes[3]], &[&nodes[2], &nodes[3]]], false, payment_preimage, Some(payment_secret.0), 200_000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to add tests for partial failure then follow-up retries, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we don't have any way to do follow-up retries (nor is there a way to do so). We just have to wait for the final node to time-out the payment, get a PaymentFailed, then retry the whole thing. I'm not sure exactly what you're looking for here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant test a send_payment attempt that results in a PaymentSendFailure::PartialFailure, and then retry the paths that are returned as failed-but-safe-to-retry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. There's currently no way to retry said paths (sadly), unless they were monitor failures. I'll open an issue for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheBlueMatt TheBlueMatt force-pushed the 2020-01-mpp branch 2 times, most recently from d7c8df3 to dccc84e Compare March 26, 2020 21:45
@TheBlueMatt
Copy link
Collaborator Author

Addressed all the pending coments, and added a number of fixup commits in between (no other changes to the original commits).

@TheBlueMatt TheBlueMatt force-pushed the 2020-01-mpp branch 2 times, most recently from 609eb31 to f06145c Compare March 27, 2020 00:26
@TheBlueMatt
Copy link
Collaborator Author

Squashed the fixup commits with no changes to make things reviewable again. Can continue discussion on #441 (comment) but other than coming to a conclusion there I think all comments have been addressed.

@@ -1007,13 +1018,19 @@ impl<ChanSigner: ChannelKeys, M: Deref, T: Deref, K: Deref, F: Deref> ChannelMan
return_err!("Upstream node set CLTV to the wrong value", 18, &byte_utils::be32_to_array(msg.cltv_expiry));
}

let payment_data = match next_hop_data.format {
msgs::OnionHopDataFormat::Legacy { .. } => None,
msgs::OnionHopDataFormat::NonFinalNode { .. } => return_err!("Got non final data with an HMAC of 0", 0x4000 | 22, &[0;0]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Would this error be more appropriate?

type: BADONION|PERM|5 (invalid_onion_hmac)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no, we already checked the HMAC for us, this HMAC is the HMAC we're supposed to send the next party in the route (hence 0 for "no more sending").

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, the onion decrypts as expected for a final node, the content is wrong (even if it's with the regard to the HMAC context) so invalid_onion_payload seems to suit. I agree that this error definition is quite loose.

@@ -401,7 +401,7 @@ pub fn do_test(data: &[u8], logger: &Arc<dyn Logger>) {
sha.input(&payment_hash.0[..]);
payment_hash.0 = Sha256::from_engine(sha).into_inner();
payments_sent += 1;
match channelmanager.send_payment(route, payment_hash) {
match channelmanager.send_payment(route, payment_hash, &None) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to, say, 50% of the time, have a non-None payment_secret?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, likely, I'm not too worried about payment_secret coverage, there aren't that many ways it can go wrong, and we hit a number of them. As we write new tests, including payment_secrets in them is likely a good idea, but going to each test and setting payment_secrets requires some writing/reviewing effort to make sure it all lines up right.

htlc_id: htlc.prev_hop.htlc_id,
incoming_packet_shared_secret: htlc.prev_hop.incoming_packet_shared_secret,
}), payment_hash,
HTLCFailReason::Reason { failure_code: 0x4000 | 15, data: byte_utils::be64_to_array(htlc.value).to_vec() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this error is missing the height parameter?

    type: PERM|15 (incorrect_or_unknown_payment_details)
    data:
        [u64:htlc_msat]
        [u32:height]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec changed, so, yes, but also we can do that in a followup PR.

// HTLC when the monitor updating is restored (or on chain).
log_error!(self, "Temporary failure claiming HTLC, treating as success: {}", e.1.err.err);
claimed_any_htlcs = true;
} else { errs.push(e); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding -- if we hit this else case and we've already claimed some funds, we will have received a partial payment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, but kinda - we've updated our local states with the payment preimage, so if we need to hit the chain we'll enforce the claim on-chain (and if we run out of time to claim it we'll auto-force-close the channel). Not every copy of our channel monitor may have been updated, but there should always be a local copy which was.

}

#[test]
fn test_simple_mpp() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a case for the partial failure/total failure cases? + a case for when 1 path fails with a monitor update failure, but then succeeds later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added one in #587.

@@ -622,6 +622,8 @@ mod fuzzy_internal_msgs {
#[derive(Clone)]
pub(crate) struct FinalOnionHopData {
pub(crate) payment_secret: PaymentSecret,
/// The total value, in msat, of the payment as received by the ultimate recipient.
/// Message serialization may panic if this value is more than 21 million Bitcoin.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/may panic/will panic and below as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer to avoid committing to that. While we currently panic, I don't want a user relying on that specific behavior.

@@ -1007,13 +1018,19 @@ impl<ChanSigner: ChannelKeys, M: Deref, T: Deref, K: Deref, F: Deref> ChannelMan
return_err!("Upstream node set CLTV to the wrong value", 18, &byte_utils::be32_to_array(msg.cltv_expiry));
}

let payment_data = match next_hop_data.format {
msgs::OnionHopDataFormat::Legacy { .. } => None,
msgs::OnionHopDataFormat::NonFinalNode { .. } => return_err!("Got non final data with an HMAC of 0", 0x4000 | 22, &[0;0]),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, the onion decrypts as expected for a final node, the content is wrong (even if it's with the regard to the HMAC context) so invalid_onion_payload seems to suit. I agree that this error definition is quite loose.

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good, but there is still few points of concern

/// If a payment_secret *is* provided, we assume that the invoice had the payment_secret feature
/// bit set (either as required or as available). If multiple paths are present in the Route,
/// we assume the invoice had the basic_mpp feature set.
pub fn send_payment(&self, route: Route, payment_hash: PaymentHash, payment_secret: &Option<PaymentSecret>) -> Result<(), APIError> {
if route.hops.len() < 1 || route.hops.len() > 20 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

93be6cd

May we enforce that sha256(payment_secret) != payment_hash ? It should be cheap compare to onion computation, that would protect a bit more buggy applications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to protect users against every possible stupid they can do. An extra sha256 to protect a user who didn't bother to read the docs is...hopefully not worth it IMO.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay fair point

}
if route.paths[0].len() < 1 || route.paths[0].len() > 20 {
return Err(APIError::RouteError{err: "Path didn't go anywhere/had bogus size"});
if route.paths.len() > 10 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f919d6a

By the way what's the rational behind 10? Maybe add a comment pointing to the limiting factor to let people know what need to get done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there really isn't a reason, I just figured we should have some reasonable-ish limit. I'll add a comment noting that its completely arbitrary, but maybe we should change it after we support retrying individual paths.

if !chan.get().is_live() {
check_res_push!(Err(APIError::ChannelUnavailable{err: "Peer for first hop currently disconnected/pending monitor update!"}));
}
break_chan_entry!(self, chan.get_mut().send_htlc_and_commit(htlc_msat, payment_hash.clone(), htlc_cltv, HTLCSource::OutboundRoute {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f919d6a

Hmmmm I think that's something you're aware of but if we have HTLC shards going through the same link we're going to throw them in holding_cell_htlc_updates, beyond first one.

In the future, we should rework Channel to send multiple HTLCs - one commitment_signed, and avoid a latency hit, assuming we do some kind of common expression reduction on our set of RouteHop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we should be using forward_htlcs for this - its already set up to send HTLCs in batches on a timer, which is what we really want (especially since it means we'll send our own HTLCs via the same codepath/at the same time as forwarded ones, which is great for privacy). See-also #583.

let payment_secret_opt =
if let &Some(ref data) = &payment_data { Some(data.payment_secret.clone()) } else { None };
let htlcs = channel_state.claimable_htlcs.entry((payment_hash, payment_secret_opt))
.or_insert(Vec::new());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

93be6cd

I think I already made the point here #441 (comment) (but it was blurred with the non-practical-wacky-secret-brute-force) that when we receive a HTLC shard for which payment_secret is junk that's a DoS concern.

As far as I understand, a forwarding node can send us another same-hash-shard-HTLC and we're not going to pass it upstream for processing and potentially cancellation. A payee can already do this by sending micro-HTLCs to bottleneck the channel but at least that's someone who receives an invoice. We should be aware of this unsafety until we implement a timeout.

A complementary channel policy could be also to limit HTLC fragmentation per-MPP to avoid malicious payees.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but sadly in general these types of issues are no more of a DoS concern than them just forwarding an HTLC through us and holding it open. In this specific case, though, I think we will send any such HTLC upstream as a PaymentReceived event. Certainly not doing so would be a privacy issue (as any handling of "same-hash-different-secret" HTLCs differently from "unique-hash" is inherently a privacy concern). Because we index on (hash, secret), not just hash, I dont see how, though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right not handling them uniquely would be a privacy leak, at least cancelling without any timeout. As you pointed out, it falls under the bigger issue of free pending HTLCs on channel links, should be revisited latter if we get upfront_payment or any other DoS prevention measure.

/// Until then, however, values of None should be ignored, and only incorrect Some values
/// should result in an HTLC fail_backwards.
/// Note that, in any case, this value must be passed as-is to any fail or claim calls as
/// the HTLC index includes this value.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

93be6cd

"Ultimately, hash shouldn't identify payment, they're only a poorman atomic multi-hop locks. Due to their publicity across payment path, payment_secret should be preferred to track payment inside an application logic". Or we should discourage people to use payment_hash internally somewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, eventually, yea. But as the docs indicate here, until sending nodes have upgraded, you kinda have to ignore any unexpected-Nones, which implies indexing by PaymentHash, still :/

if let msgs::ErrorAction::IgnoreError = e.1.err.action {
// We got a temporary failure updating monitor, but will claim the
// HTLC when the monitor updating is restored (or on chain).
log_error!(self, "Temporary failure claiming HTLC, treating as success: {}", e.1.err.err);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9cd702a

We may not succeed solving monitor issue on this local link before CLTV expires for this HTLC, but still reveal preimages. It means now if there is some forwarding node intersection, all downstream payments may be claimed without final node being fully-paid. It may lead to miscommunications between payer-payee, because an internal monitor error may take to a payment not being a Success or a Failure but to some state in-between.... ? (and I think that's new)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Likely the first place where we may give the preimage back to the sender before either committing to it or seeing it confirmed on-chain. That sucks, but also not a ton we can do about it - we'd have to wait until every channel is in a usable state (and not awaiting remote RAA) before passing preimages back.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really not great, thinking more about it, claiming a MPP means it should be atomic across all ChannelMonitor concerned. I think we can implement this semantic by splitting get_update_fulfill_htlc at ChannelMonitorUpdateStep::PaymentPreimage, returning from here and commit preimage on all ChannelMonitors. If we don't have any error, we can keep moving forward updating offchain for all channels.

If any channel fails to move forward and gets a new local_commitment tx without fulfilled HTLC, its ChannelMonitor should be able to claim inbound.

I think that's a bit of plumbery so I don't propose to do it now, but we should keep track of it with a TODO or an issue and I volunteer to implement that. Because this is not going to be solved by Base AMP and that's a pretty nasty case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, you still have a risk, because a failing Monitor may be due to a) a host crash or something like this or b) a network partition if it's a remote one. If it's a network partition and in case of cancelling a MPP for a failure elsewhere, the isolated, non-updated Monitor will act on its own at timeout expiration and claim HTLC onchain.

But in the meanwhile, you may have cancel offchain all other shards so even if preimage is revealed, that's less a concern, state is fucked up only for one-shard. I think that's slightly better, because we don't lose funds ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm not suggesting its ideal, but mostly that there isn't a lot we can do without overhauling our channel state machine so that we do a two-phase update + commit. We can't really update a channel's monitor without also sending back the update_fulfill since the channelmonitor will force-close the channel if the HTLC is getting close to expiry, and we can't find out if we can update all the monitors without doing so iteratively.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started writing up a long issue about this to track it, but I'm not sure it really makes sense - for our own payments we already have essentially the same thing, just without the non-atomicity. We give up the preimage to a peer but they could go offline immediately after having received the update_fulfill_htlc message before responding, making us go on chain and try to broadcast an HTLC-Success transaction to get our funds. The only new issue is that we may be doing so for part of our payment.

Things are different for relayed payments, but they remain so.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we may already go onchain, same as for normal payment. Thinking more about this, I see this as an issue on the payer-payee interaction level, where you may either a) not get full-payment due to one ChannelMonitor failure but some on-path node pulling all HTLCs from payer due to uniqueness of hashlocks or b) one HTLC shard being claimed on chain due to an irresponsive, un-updatable ChannelMonitor by payee and other shards being cancelled, in that case only a partial payment would have been made.

IMO, I prefer b) because at least that let funds in the hand of payee and you can still assume some communications with payer for a refund. But clearly documenting this case that a ChannelMonitor failure is an operational risk for a payee would be fine for me too.

I have a really loosely concern this may be exploited by some tower, being also a payer, and intentionally triggering a ChannelMonitorUpdateErr to avoid paying full-value goods...

Let's open an issue referring to these comments and move forward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) not get full-payment due to one ChannelMonitor failure but some on-path node pulling all HTLCs from payer due to uniqueness of hashlocks

I really don't understand this?

b) one HTLC shard being claimed on chain due to an irresponsive, un-updatable ChannelMonitor by payee and other shards being cancelled, in that case only a partial payment would have been made.

I don't think this can happen? Indeed, we may have some HTLCs contested on-chain, and we may lose some or all of those contests. But this is also the case for non-MPP payments today - we may give up the preimage and lose the on-chain contest due to fee/block space issues.

Another similar issue is us as a forwarding/routing node - we may be provided the preimage by the next-hop, but fail to update the monitor for the previous hop. This is an (IMO) well-documented case, and exactly the reason why the monitors must always have a local copy that must never fail to be updated.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🥇 yay!

@TheBlueMatt TheBlueMatt added this to the 0.0.11 milestone Apr 14, 2020
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK 1358976

I think few comments can be better laid out, but let's move forward and fixes this in another PR.

// an HTLC to time out). This should, of course, only occur if the user is the
// one doing the claiming (as it being a part of a peer claim would imply we're
// about to lose funds) and only if the lock in claim_funds was dropped as a
// previous HTLC was failed (thus not for an MPP payment).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this new comment. I see the failure where we may try to claim a HTLC off-chain and at same time this or another HTLC is claimed on chain and would trigger a ChannelMonitorUpdateErr, but that's not what this Err path is handling ?

"as it being a part of a peer claim would imply we're about to lose funds" you mean remote timing out onchain a offered HTLC ? but in that case that's not a loss because we won't reveal preimage ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean locally timing out an HTLC ala #587. Lets continue the conversation there because it requires #587 to make sense.

This should avoid blowing up the size of the struct when we add
additional data that is only relevant for receive.
This is the first step in Base AMP support, just tracking the
relevant data in internal datastructures.
We should probably do this for all values (and define a newtype
for msat values), but this will do for now.
Base AMP is centered around the concept of a 'payment_secret` - an
opaque 32-byte random string which is used to authenticate the
sender to the recipient as well as tie the various HTLCs which
make up one payment together. This new field gets exposed in a
number of places, though sadly only as an Option for backwards
compatibility when sending to a receiver/receiving from a sender
which does not support Base AMP.

Sadly a huge diff here, but almost all of it is changing the method
signatures for sending/receiving/failing HTLCs and the
PaymentReceived event, which all now need to expose an
Option<[u8; 32]> for the payment_secret.

It doesn't yet properly fail back pending HTLCs when the full AMP
payment is never received (which should result in accidental
channel force-closures). Further, as sending AMP payments is not
yet supported, the only test here is a simple single-path payment
with a payment_secret in it.
Rather big diff, but its all mechanical and doesn't introduce any
new features.
Previously if we claimed an MPP where a previous-hop channel was
closed while we were waitng for the user to provide us the preimage
we'd simply skip claiming that HTLC without letting the user know.

This refactors the claim logic to first check that all the channels
are still available (which is actually all we need - we really
mostly care about updating the channel monitors, not the channels
themselves) and then claim the HTLCs in the same lock, ensuring
atomicity.
This rather dramatically changes the return type of send_payment
making it much clearer when resending is safe and allowing us to
return a list of Results since different paths may have different
return values.
Add documentation to the struct fields noting this to avoid missing
docs when various msg structs become public.
ChannelManager::send_payment stopped utilizing its ownership of the
Route with MPP (which, for readability, now clone()s the individual
paths when creating HTLCSource::OutboundRoute objects). While this
isn't ideal, it likely also makes sense to ensure that the user has
access to the Route after sending to correlate individual path
failures with the paths in the route or, in the future, retry
individual paths.

Thus, the easiest solution is to just take the Route by reference,
allowing the user to retain ownership.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants