-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track full-path htlc-minimum-msat while routing #815
Track full-path htlc-minimum-msat while routing #815
Conversation
Codecov Report
@@ Coverage Diff @@
## main #815 +/- ##
==========================================
+ Coverage 90.59% 92.37% +1.77%
==========================================
Files 51 51
Lines 27109 32658 +5549
==========================================
+ Hits 24560 30168 +5608
+ Misses 2549 2490 -59
Continue to review full report at Codecov.
|
ef88cb0
to
cc6a6b3
Compare
Pushed a few more fix commits (including some regressions from MPP) including tests for each bug. There may be one more, but these all stand. |
cc6a6b3
to
c024ea6
Compare
lightning/src/routing/router.rs
Outdated
@@ -778,17 +790,17 @@ pub fn get_route<L: Deref>(our_node_id: &PublicKey, network: &NetworkGraph, paye | |||
if have_hop_src_in_graph { | |||
// BOLT 11 doesn't allow inclusion of features for the last hop hints, which | |||
// really sucks, cause we're gonna need that eventually. | |||
let last_hop_htlc_minimum_msat: u64 = match hop.htlc_minimum_msat { | |||
let last_path_htlc_minimum_msat: u64 = match hop.htlc_minimum_msat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the variable renaming or the checking along each hop path that the amount transferred is always above worst-hop's-htlc_minimum_msat-along-the-path ? Where such constraint only apply to hops after such value (i.e closer to payer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the renaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm not sure either. I reverted the change.
Minor comments, ACK 68b62c0 |
fuzz/src/router.rs
Outdated
let mut scid = 42; | ||
|
||
let mut channel_limits = HashMap::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this buffers channel direction, channel_direction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this isn't in the diff anymore? Not sure what it was a reference to.
// by an htlc_minimum_msat value, find another path with a higher value, potentially | ||
// allowing us to pay fees to meet the htlc_minimum while still keeping a lower total fee. | ||
if already_collected_value_msat == final_value_msat && payment_paths.len() == 1 { | ||
if !hit_minimum_limit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following tweak doesn't break router unit tests. It might be covered by new test exact_fee_liquidity_limit()
but our overpaid value reduction logic is making the outcomes similar ?
diff --git a/lightning/src/routing/router.rs b/lightning/src/routing/router.rs
index 7f00ccdf..e555575b 100644
--- a/lightning/src/routing/router.rs
+++ b/lightning/src/routing/router.rs
@@ -928,9 +928,9 @@ pub fn get_route<L: Deref>(our_node_id: &PublicKey, network: &NetworkGraph, paye
// by an htlc_minimum_msat value, find another path with a higher value, potentially
// allowing us to pay fees to meet the htlc_minimum while still keeping a lower total fee.
if already_collected_value_msat == final_value_msat && payment_paths.len() == 1 {
- if !hit_minimum_limit {
- break 'paths_collection;
- }
+ //if !hit_minimum_limit {
+ // break 'paths_collection;
+ //}
path_value_msat = recommended_value_msat;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that line is a strict performance improvement - if we get there the path we found definitely should be the best one, we could do more work, but it won't find a better path.
lightning/src/routing/router.rs
Outdated
@@ -778,17 +790,17 @@ pub fn get_route<L: Deref>(our_node_id: &PublicKey, network: &NetworkGraph, paye | |||
if have_hop_src_in_graph { | |||
// BOLT 11 doesn't allow inclusion of features for the last hop hints, which | |||
// really sucks, cause we're gonna need that eventually. | |||
let last_hop_htlc_minimum_msat: u64 = match hop.htlc_minimum_msat { | |||
let last_path_htlc_minimum_msat: u64 = match hop.htlc_minimum_msat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the variable renaming or the checking along each hop path that the amount transferred is always above worst-hop's-htlc_minimum_msat-along-the-path ? Where such constraint only apply to hops after such value (i.e closer to payer).
Will respond to comments soon, but there are further issues I need to debug. |
Punting to 0.0.14, which we can hopefully accelerate. |
68b62c0
to
4cf9695
Compare
Addressed all the bugs I came across, plus some freebie performance tweaks. Still need to address the above comments and add more comments in the changes + write one new test. |
e41e0aa
to
b39f1b9
Compare
Ok, only a month later, I believe I've addressed all outstanding feedback on this PR, addressed all the issues I've found, and have at least some marginal tests to detect any regressions in the specific issues I've fixed. |
struct PaymentPath { | ||
hops: Vec<PathBuildingHop>, | ||
struct PaymentPath<'a> { | ||
hops: Vec<(PathBuildingHop<'a>, NodeFeatures)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just out of curiosity, what's the benefit of creating a vec of tuples as opposed to making NodeFeatures a property of PathBuildingHop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PathBuildingHop is created for each node as we walk the graph, but PaymentPath only at the end. By avoiding keeping Features (which are Vecs under the hood) in the graph map during the main graph walk, we avoid a lot of Vec clones.
// We don't want multiple paths (as per MPP) share liquidity of the same channels. | ||
// This map allows paths to be aware of the channel use by other paths in the same call. | ||
// This would help to make a better path finding decisions and not "overbook" channels. | ||
// It is unaware of the directions (except for `outbound_capacity_msat` in `first_hops`). | ||
let mut bookkeeped_channels_liquidity_available_msat = HashMap::new(); | ||
let mut bookkeeped_channels_liquidity_available_msat = HashMap::with_capacity(network.get_nodes().len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "bookkeeped?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, not my variable name, in transliterated english it makes sense, though :p.
// path. | ||
let victim_liquidity = bookkeeped_channels_liquidity_available_msat.get_mut( | ||
&payment_path.hops[(payment_path.hops.len() - 1) / 2].0.short_channel_id).unwrap(); | ||
*victim_liquidity = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol at the name
|
||
// We construct a network that looks like this: | ||
// | ||
// node2 -1(3)2- node3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the bracketed number the value locked up in HTLCs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its the short_channel_id. I think this is documented or at least more clear in the graphs in some of the earlier tests.
bb5fc09
to
9f4f358
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ongoing review, just few nits
lightning/src/routing/router.rs
Outdated
// | ||
// One potentially problematic case for this algorithm would be if there are many | ||
// liquidity-limited paths which are liquidity-limited near the destination (ie early in our | ||
// graph walking), we may never find a liquidity-unlimited path which has lower proportional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's a liquidity-unlimited path ? I think it's comparative there and not binary ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Canged to "a path which is not liquidity-limited".
9f4f358
to
0a0aa6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ongoing review so far up to dc01b0e
// plus the minimum per-HTLC fee to get from it to another node (aka "shitty pseudo-A*"). | ||
// | ||
// We are not a faithful Dijkstra's implementation because we can change values which impact | ||
// earlier nodes while processing later nodes. Specifically, if we reach a channel with a lower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this say "upper liquidity limit" ? IMO, a lower liquidity limit would be a htlc_minimum_msat
leading us to increase the value sent along the path, not decrease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I read it as "the liquidity limit is lower" not "the lowest liquidity value". I could rephrase to "lower liquidity upper-bound"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I like "lower liquidity upper-bound".
lightning/src/routing/router.rs
Outdated
// fee (and only lower absolute fee when considering the ultimate value sent). Because we only | ||
// consider paths with at least 5% of the total value being sent, the damage from such a case | ||
// should be limited, however this could be further reduced in the future by calculating fees | ||
// on the amount we wish to route over a path, not the amount we are routing over a path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you precise further what would be "the amount we wish to route over a path" ? IIUC "the amount we are routing over a path" is either recommended_value_msat
or path_value_msat
in function of hitting hit_minimum_limit
with next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the sentence to end with "ie ignoring the liquidity limits for the purposes of fee calculation."
lightning/src/routing/router.rs
Outdated
if !prevented_redundant_path_selection { | ||
// If we weren't capped by hitting a liquidity limit on a channel in the path, | ||
// we'll probably end up picking the same path again on the next iteration. | ||
// Randomly decrease the available liquidity of a hop in the middle of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think neither the selection nor the penalty are random here ? It's always middle hop of the path and penalty is always 100% ?
Further, what's the exact purpose of this mitigation against duplicated path selection ? Increasing the fault-tolerance of a MPP route by spreading the paths traversed ?
I'm worried about the 100% penalty inflicted for holder node with poor connection graph ? Don't we have a risk of graph partitioning ourselves due to this measure ? I'll be more at ease with a 20% or 50% penalty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is you may be sending 1/1000th of the value of the node in the middle. Without this commit, we just select the same route over and over again and never find any other options. Eventually we should do something somewhat smarter, like checking if a route is exactly identical to one we already selected, but just avoiding the same middle hop solves the issue as well for now.
I dropped the "random", though it was a reference to the somewhat random decision to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good to me for now. I don't have a better mitigation against repeated path selection to propose rn.
0a0aa6e
to
1bde367
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think aa9c2c2 is worthy of better explanations. Otherwise, it sounds good to me modulo squashing.
// found (which we are processing now) has a lower value | ||
// contribution due to an HTLC minimum limit. | ||
debug_assert!(path_htlc_minimum_msat < old_entry.path_htlc_minimum_msat); | ||
debug_assert!(value_contribution_msat < old_entry.value_contribution_msat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't grasp this comment and the corresponding commit. AFAICT, value_contribution_msat
isn't function of path_hltc_minimum_msat
, this latter value is only used to compute Dijkstra score.
"While walking the graph doing Dijkstra's, we may decrease the amount being sent along one path, and not others, based on the htlc_minimum_msat value. This may result in a lower relative fees on that path in comparison to others. In the extreme, this may result in finding a second path to a node with a lower fee than the first path found (normally impossible in a Dijkstra's implementation, as we walk next hops in fee-order)."
Could you provide a topological-example of the situation described ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note (see diff below) and squashed.
// We don't want multiple paths (as per MPP) share liquidity of the same channels. | ||
// This map allows paths to be aware of the channel use by other paths in the same call. | ||
// This would help to make a better path finding decisions and not "overbook" channels. | ||
// It is unaware of the directions (except for `outbound_capacity_msat` in `first_hops`). | ||
let mut bookkeeped_channels_liquidity_available_msat = HashMap::new(); | ||
let mut bookkeeped_channels_liquidity_available_msat = HashMap::with_capacity(network.get_nodes().len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-allocating to the size of the whole graph assumes we'll completely traverse it before reaching recommended_value_msat
or path_value_msat
liquidity threshold ? Or what's your thinking there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I mean I guess we could divide by some factor, but why bother, its short-lived and at worst we will, so we might as well just allocate for it.
1bde367
to
670fccd
Compare
The new MPP routing algorithm attempts to build paths with a higher value than our payment to find paths which may allow us to pay a fee to meet an htlc_minimum limit. This is great if we're min-bounded, however it results in us rejecting paths where we are bounded by a maximum near the end of the path, with fees, and then bounded by a minimum earlier in the path. Further, it means we will not find the cheapest path where paths have a lower relative fee but a higher absolute fee. Instead, we calculate routes using the actual amount we wish to send. To maintain the previous behavior of searching for cheaper paths where we can "pay the difference" to meet an htlc_minimum, we detect if we were minimum-bounded during graph walking and, if we are, we walk the graph again with a higher value.
Previously, we'd happily send funds through a path where, while generating the path, in some middle hope we reduce the value being sent to meet an htlc_maximum, making a later hop invalid due to it no longer meeting its htlc_minimum. Instead, we need to track the path's htlc-minimum while we're transiting the graph.
If we walk the network graph and find a route that meets are payment amount and has a liquidiy limit far in excess of our payment amount, we may select the same path several times in the main path-gathering loop. Instead, if the path we selected was not limited by the available liquidity, we set the middle hop in the path's liquidity available to zero, disabling that channel in future path finding steps.
Currently, the "best source" for a given node tracked during Dijkstra's is updated with a different critera from the heap sorting, resulting in loops in calculated paths. This adds a test for the specific failure currently seen, utilizing the new path-htlc-minimum tracking in the heap entries in place of the per-hop htlc-minimum values which the MPP changeset partially used.
When walking the network graph to calculate a route, we always calculate the minimum fee which is required to make one further hop. This avoids some extra hop processing at the end of each path selection loop (saving around 10% runtime in our benchmarks). However, if we use the real value which we expect to send over a channel in that calculation, we may find an alternate path to the same node which is more expensive but capacity-constrained, resulting in us considering it cheaper as the relative fee paid will be lower. Instead, we can use the `minimal_value_contribution_msat`, which is a constant through an entire path finding iteration, as the amount, preventing any basis change in the relative fee paid.
This is the same code as was recently failing in our benchmarks, adapted to use a random starting seed instead of a fixed one and a smaller iteration to reduce runtime.
While walking the graph doing Dijkstra's, we may decrease the amount being sent along one path, and not others, based on the htlc_minimum_msat value. This may result in a lower relative fees on that path in comparison to others. In the extreme, this may result in finding a second path to a node with a lower fee than the first path found (normally impossible in a Dijkstra's implementation, as we walk next hops in fee-order). In such a case, we end up with parts of our state invalid as further hops beyond that node have been filled in using the original total fee information. Instead, we simply track which nodes have been processed and ignore and further updates to them (as it implies we've reduced the amount we're sending to find a lower absolute fee, but a higher relative fee, which isn't a better route). We check that we are in exactly this case in test builds with new in-line assertions. Note that these assertions successfully detect the bug in the previous commit. Sadly this requires an extra HashMap lookup every time we pop an element off of our heap, though we can skip a number of heap pushes during the channel walking.
We currently copy the features objects in each channel as we walk the graph during route calculation. This implies a significant amount of malloc traffic as the features flags object are stored on the heap. Instead, because they features being referenced are in the network graph which we hold a reference to, we can simply store references to them. This nontrivially improves our get_route benchmark by around 5%.
See comment in the removed block, note that the subsequent subtraction will underflow if the block would otherwise have been reached.
670fccd
to
5f5bc41
Compare
Expanded comment and included some extra fuzzing tests. Diff is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK 5f5bc41 modulo nit
// Channel A is processed first, and the channels onwards from | ||
// node 1 are added to the to-process heap. Thereafter, we pop | ||
// Channel B off of the heap, note that it has a much more | ||
// restrictive htlc_maximum_msat, and recalculate the fees for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Straightforward comment thanks! Should this say "htlc_minimum" and under ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I dont think so, the issue appears when the value_contribution_msat
is lower (which is often just the path htlc_maximum_msat
). The htlc_minimum_msat
check here is about the criteria for when it should be possible to hit this case.
Gonna merge cause I'm tired of CI failing. I promise to open a new PR to fix the comment if I'm wrong :). |
To review new changes to #646, I wrote an extension of the router fuzzer to test each route against the graph available. This found one non-regression which is fixed here, as well as the fuzzer extension and a fix to make the router fuzz target deterministic.
It needs a) a test, b) a long discussion in
get_route
about how our algorithm ultimately differs from Dijkstra's/A*, why that's ok, and some edge cases where other algorithms may be more optimal.