-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plumb Features through into Routes #433
Plumb Features through into Routes #433
Conversation
1d4f860
to
fac80d6
Compare
f47d4aa
to
196fd59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it's good once #428 is merged though few questions
lightning/src/ln/features.rs
Outdated
pub trait FeatureContextInitNode : FeatureContext {} | ||
impl FeatureContextInitNode for FeatureContextInit {} | ||
impl FeatureContextInitNode for FeatureContextNode {} | ||
mod sealed { // You should just use the type aliases instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like it better, you may keep at least comments "This Context represents when the Feature appear..." to let know what struct tie to what message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this got moved into #428, so probably best to keep the conversation there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on marking a PR as "draft" if it is based on an open PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I can do whatever. I generally just use "draft" to mean "not ready to hit merge if this got acks as-is".
latest_features: init_msg.features.clone(), | ||
})); | ||
}, | ||
hash_map::Entry::Occupied(e) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any risk of peer collision ? An attacker can usurp pubkey available on the net but without privkey it won't get through the noise phase..and if a peer leaks its privkey we can't guarantee consistency of our structs (and we may have bigger troubles)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. If you can impersonate a peer, that peer is screwed. Not a lot we can do then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this occur? Won't PeerManager
disconnect when it sees a duplicate id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using our PeerManager, yes, this should not be possible, though in theory these functions are public and the user may run their own PeerManager-like thing that may not properly check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What sort of semantics do we want to provide to users in this case and where do we plan on documenting it? I can see three possibilities:
- Return an error / panic
- Overwrite the peer state with a new entry
- Update the peer state with new data
For (1), my assumption is (that if used properly) there should already have been a peer_disconnected
event even if this is a reconnect. So the entry would not exist in that case. Is my understanding accurate?
Currently (2) and (3) are equivalent but may not be in the future. You're implementing this as (3) though the code would be simplified if (2) is the desired behavior (i.e., you could blindly insert into the map instead of matching on the entry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think in general if you don't follow the API, there are no guarantees period. It maybe could be further documented that you must somehow verify that a given message came from something that has proven knowledge of the relevant private key, but if a counterparty's private key leaks, we won't panic, it just may result in the counterparty losing their funds/channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently (2) and (3) are equivalent but may not be in the future.
Hmmm not sure I would change, it's state relative to the connection it shouldn't leak to the ChannelManager layer, if it's concerning channel+routing, assuming LN messages are ordered newer ones should take precedence.
It maybe could be further documented that you must somehow verify that a given message came from something that has proven knowledge of the relevant private key
Transport is assumed to be authenticated and encrypted so verification than message came from something which owns the relevant private key is done I think. That's said identity of the public key responder, how do you establish this knowledge is beyond specs. Keys leakage means your funds are loss, just send a HTLC emptying your balance to some exit node, it doesn't enter in the scope of option_upfront_shutdown_script.
lightning/src/ln/channelmanager.rs
Outdated
/// new channel. | ||
/// If we are connected to a peer we always at least have an entry here, to store their features | ||
/// so that we have them available if we open a channel with them and need it for routing. | ||
per_peer_state: RwLock<HashMap<PublicKey, Mutex<PerPeerState>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What parts of ChannelHolder do you plan to move here, by_id ?
Further, I'm wondering if we can do better on lock management. Like not putting the lock on the PerPeerState directly but something like PerPeerState { peer_chans_by_id <RwLock<HashMap<[u8; 32], Mutex<Channel>>>, latest_features: Mutex }, tho not sure it's worth the complexity if we assume most of time you'll have one chan per peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was figuring by_id and pending_msg_events (at least for those message events that are non-broadcast, but I dont think there's harm in having broadcast events in there too). That would mean our message queue and channel state per-peer would be under a lock, and it would all be per-peer, which I think is the right design. Indeed, we could lock per-channel too but I don't think its worth it because the caller isn't allowed to have two in-flight message calls for a single peer, so we'd almost never be able to be doing something on both channels at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I'm not sure what do we win on concurrency with putting pending_msg_events there, unless we change get_and_clear_pending_msg_events to be per-peer too but it's just going to push the bottleneck to PeerManager and its Mutex<PeerHolder ?
but I don't think its worth it because the caller isn't allowed to have two in-flight message calls for a single peer
Could you elaborate ? Because I always thought than the ChannelMessageHandler API was flexible enough so that you could at least call two handle_* at same time (even if it's likely you contend for locks inside them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: concurrency - the moving of pending_msg_events into per_peer data is more for logical consistency than anything - we have to push events onto that queue while holding the lock we took to edit the Channel data. The only way to keep parallelism is to split the pending_msg_events into per_peer_pending_msg_events and include it in the same Mutex that holds the Channel.
re: elaboration: The ChannelMessageHandler API explicitly states "Messages MAY be called in parallel when they originate from different their_node_ids, however they MUST NOT be called in parallel when the two calls have the same their_node_id". Fundamentally the lightning protocol requires in-order message delivery.
lightning/src/ln/channelmanager.rs
Outdated
/// This is particularly useful for routing as many node_announcement-context features are also | ||
/// set in the init-context, and we should almost certainly consider the init-context version | ||
/// to be the latest copy. | ||
pub remote_node_features: InitFeatures, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about assuming the init-context version is the latest copy, a peer may be able to change its advertised services without renewing any connection with us, just by broadcasting a new node_announcement, so between init and node_announcement I think we should reconciliate features where we have both states available, i.e in get_route ?
But do we really need to cache them in ChannelManager, can't we store everything in PeerManager ? It's harder to see given a lot of features are obviously yet to be designed, so maybe we should have at least an access function returning NodeDetails to let client select nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I was kinda assuming to get new features you'd at least have to restart your node. It may make sense to further clarify the BOLTs in this regard, but certainly today the restart requirement holds (and it may be some time after restart before you create a new announcement, so Init would be the latest-correct).
As for storing it in peer_manager, we could do that, but it just makes for an awkward API - it would mean the client has to call something on ChannelManger, get a list of channels, then go ask PeerHandler for the InitFeatures for each node that we are connected to. It's not exactly an obviously-correct place to store it, but I don't think its completely absurd for ChannelManager to store the latest InitFeatures we have, especially given it already tracks connected-ness state for peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought at first that features were considered the same than service bits on base layer and so stable until restart but I can't find this explicit requirement in the bolts, maybe we should amend them ?
Based on this we may have to adapt the API to conciliate both messages but if your interpretation holds I'm okay with the API.
The NodeDetails is a different point though and I think it's needed anyway (e.g a client don't want to open channel with nodes not using opt_anchor_output or other), can be added in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, its definitely not "required" by the BOLTs but I figured it was a safe assumption. As for init features overriding node_announcement features, I think thats also a reasonable assumption but I agree that we should make that explicit in the BOLTs.
} | ||
} | ||
add_entry!(hop.short_channel_id, target, hop, 0); | ||
// BOLT 11 doesn't allow inclusion of features for the last hop hints, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we may want to use var_onion_optin for routing to last_hop but we don't have any connection or receive yet a node_announcement for him ? Even wonder if not allowing this is bad for privacy because it forces node to advertise its existence on the net or allow incoming connections instead of few selected gateways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there's currently no reason to need TLV-based onion for second-to-last-hops (and the invoice does provide features for last-hops), so its no big deal. It should probably be fixed eventually, but it doesn't impact B-AMP or so currently. May be worth opening an issue on the BOLT repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean like amending r
field to provide features with routing hints ? You can assume than receiving is only to use nodes feature-compliant with its payment scheme, but we may have more complex schemes though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, ideally r fields could hold feature bits in some form for those nodes, but no reason to need it today.
3b748f9
to
7b2276c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// State we hold per-peer. In the future we should put channels in here, but for now we only hold | ||
/// the latest Init features we heard from the peer. | ||
struct PerPeerState { | ||
latest_features: InitFeatures, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level question: Given this is duplicated in PeerManager
's Peer
structs, has there been any though around designing the API in terms of Peer
and Channel
abstractions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean - you mean something like replacing the their_node_id argument with some struct that holds more information (eg the init message)? We could, I suppose, though note that latest_features here is intended to be available even while disconnected (though maybe it doesn't actually need to be, just nice to have).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly thinking whether it would be suitable to design the overall API (not just this module) in terms of Peer
and Channel
abstractions and use them throughout to build higher-level abstractions like ChannelManager
. Admittedly, this is outside the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Maybe. It would certainly be tricky because a Peer "owns" a channel, but also the Peer can disconnect and reconnect and get back the same Channel. We can explore more in the future.
/// Because adding or removing an entry is rare, we usually take an outer read lock and then | ||
/// operate on the inner value freely. Sadly, this prevents parallel operation when opening a | ||
/// new channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that I follow what is meant by "usually" here. Can it be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in, if you add or remove a connection you'll need the outer write lock, but usually we dont so take the read lock.
lightning/src/ln/channelmanager.rs
Outdated
/// If we are connected to a peer we always at least have an entry here, to store their features | ||
/// so that we have them available if we open a channel with them and need it for routing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit verbose and may get out of sync with PerPeerState
by talking about specific data within it. I'd recommend rewording more succinctly as:
Per-peer state used for opening channels and routing.
This can also replace the first sentence of the documentation since currently that mostly just restates the field name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I mean where features gets stored, sure, I suppose, though the note that "we'll always have something here if we're connected" is an important thing to document - it indicates when you can rely on an entry vs having to check and see if there is one. I'll just drop the "to store their features..." bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My larger point is that if it is important then it should be stated upfront in the documentation (rather than after the locking comments) and done in a succinct manner such that the reader can quickly understand its purpose. IMHO, "Per-peer state storage" doesn't accomplish that as it's essentially restating the field name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, ok, missed the objection to the opening line. I reworded it to "The bulk of our storage will eventually be here (channels and message queues and the like)." I presume thats a bit more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I'd expect the first sentence to use the word "peer" at very least; "storage" doesn't seem like a primary characteristic of what the field contains. And why is "bulk" relevant to the reader? The fact that this data is persisted seems like an ancillary detail to me. More important is how it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the variable name is pretty clear about it being a peer-indexed object :). As for "storage" its more of a future-looking thing - once we move channels and pending_msg_events, its really almost all of the ChannelManager "stuff" lives there.
I'm not sure what you're referring to about "persisted" there - nothing mentions persistence, only that a precondition exists that there is an entry if the peer is connected (ie if you're in a message handler, its guaranteed to be there, so you don't have to check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, documentation should primarily state what something is and how it is used (i.e., its purpose). This is the first thing that should be stated. In other words, we are storing it only because it has some primary purpose. By "persisted" I simply meant saved to storage, which seems like an ancillary detail to the fact that this has some purpose.
I can see the argument that some of this documentation belongs on PeerState
while other belongs on per_peer_state
. I would like to see a clear divide between the two. :) In that sense, yes, I see your point: this should mostly talk about locking and how the mapping works. But I'd still argue that storage is not the primary purpose and therefore should not be in the first sentence.
Separately, anything future-looking should not be part of documentation. Just put it in once the code reflects that reality. Or at very least it should be more of an aside rather than the primary sentence in my opinion.
latest_features: init_msg.features.clone(), | ||
})); | ||
}, | ||
hash_map::Entry::Occupied(e) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this occur? Won't PeerManager
disconnect when it sees a duplicate id?
7b2276c
to
865b0ea
Compare
"Fix EnforcingChannelKeys panic when our counterparty burns their $." is #445. The message fuzz target strictness commit I can move into a later PR, but its useful for the TLV stuff and I figured it'd be easier to land it earlier in the stack since the TLV stuff is rather large. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message fuzz target strictness commit I can move into a later PR, but its useful for the TLV stuff and I figured it'd be easier to land it earlier in the stack since the TLV stuff is rather large.
If it's related to the TVL stuff, then it should go with it, no? The commit itself is rather small so shouldn't be problematic there.
latest_features: init_msg.features.clone(), | ||
})); | ||
}, | ||
hash_map::Entry::Occupied(e) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What sort of semantics do we want to provide to users in this case and where do we plan on documenting it? I can see three possibilities:
- Return an error / panic
- Overwrite the peer state with a new entry
- Update the peer state with new data
For (1), my assumption is (that if used properly) there should already have been a peer_disconnected
event even if this is a reconnect. So the entry would not exist in that case. Is my understanding accurate?
Currently (2) and (3) are equivalent but may not be in the future. You're implementing this as (3) though the code would be simplified if (2) is the desired behavior (i.e., you could blindly insert into the map instead of matching on the entry).
/// State we hold per-peer. In the future we should put channels in here, but for now we only hold | ||
/// the latest Init features we heard from the peer. | ||
struct PerPeerState { | ||
latest_features: InitFeatures, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly thinking whether it would be suitable to design the overall API (not just this module) in terms of Peer
and Channel
abstractions and use them throughout to build higher-level abstractions like ChannelManager
. Admittedly, this is outside the scope of this PR.
865b0ea
to
28abb40
Compare
lightning/src/ln/channelmanager.rs
Outdated
let per_peer_state = self.per_peer_state.read().unwrap(); | ||
for chan in res.iter_mut() { | ||
if let Some(peer_state) = per_peer_state.get(&chan.remote_network_id) { | ||
chan.remote_node_features = peer_state.lock().unwrap().latest_features.clone(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is repeated. Could you extract this behavior into a method? Better yet -- since the relationship between list_usable_channels
and list_channels
is that the former is a subset of the latter -- define list_usable_channels
as a filter over list_channels
. Then even more code wouldn't be repeated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do one better, instead of filtering after creation we can do it while doing so to avoid pushing ChannelDetails onto the Vec and then removing them :)
lightning/src/ln/channelmanager.rs
Outdated
/// The Features this node provided us opon our last connection with them. | ||
/// This is particularly useful for routing as many node_announcement-context features are also | ||
/// set in the init-context, and we should almost certainly consider the init-context version | ||
/// to be the latest copy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation is a bit verbose. Some tips:
- Avoid adverbs, especially when they introduce ambiguity. Phrases like "should almost certainly" aren't meaningful to the reader without further explanation as to when what follows does not hold.
- Cut out extraneous words. Does the meaning change if you used "useful" instead of "this is particularly useful"? If not, get rid of the words so the reader has less to parse.
- Avoid first-person pronouns like "we" and "our". See this reference for details.
Consider this more concise version:
Features the channel counterparty provided upon last connection. Useful for routing since it contains features also provided by node_announcement messages only more recent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tweaked it some, let me know if you like it, I did want to highlight a bit more that not all routing-relevant features may be present in the init context, but many are.
lightning/src/ln/features.rs
Outdated
|
||
/// Takes the flags that we know how to interpret in an init-context features that are also | ||
/// relevant in a channel-context features and creates a channel-context features from them. | ||
pub(crate) fn relevant_init_flags_to_channel(_init_ctx: &InitFeatures) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"channel" in the name is redundant with the struct name. Instead, could you use std::convert::From
for this? This is more idiomatic for Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, at least personally I really hate From unless its super, duper, absolutely clear, and documentation describing exactly what it does would be wholly redundant. I don't think thats true here cause we're only copying over known flags. I changed it to with_known_relevant_init_flags, so callsites read ChannelFeatures::with_known_relevant_init_flags, which I think is pretty clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "supported" rather than "known" features? I'd consider dropping "relevant" as that should be implied by the fact that it returns ChannelFeatures
.
lightning/src/ln/channelmanager.rs
Outdated
/// This is particularly useful for routing as many node_announcement-context features are also | ||
/// set in the init-context, and we should almost certainly consider the init-context version | ||
/// to be the latest copy. | ||
pub remote_node_features: InitFeatures, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reconcile the overloading of node_features
? Here, it is referring to something of type InitFeatures
but in RouteHop
the node_features
field has type NodeFeatures
. More confusingly, they are used together in Router::get_route
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took your wording of "counterparty".
f6328c4
to
be74dbf
Compare
Since we want to keep track of the Init-context features for every peer we have channels with, we have to keep them for as long as the peer is connected (since we may open a channel with them at any point). We go ahead and take this opportunity to create a new per-peer-state struct which has two levels of mutexes which is appropriate for moving channel storage to. Since we can't process messages from a given peer in parallel, the inner lock is a regular mutex, but the outer lock is RW so that we can process for different peers at the same time with an outer read lock.
if res.last().unwrap().pubkey == *target { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason features are set on the last hop but other fields are not? It's unclear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RouteHop object is a little confusing - some fields are about the channel used to get to the given node and some fields are about the node itself. Every field should be set by the end of the loop, but the fee_msat and cltv_expiry_delat is set in the previous iteration.
be74dbf
to
2804fd3
Compare
This exposes the latest Init-context features in the ChannelDetails passed to the Router during route calculation, which combines those with the Node-context features tracked from node_announcements to provide the latest Node-context features in RouteHop structs. Fields are also added for Channel-context features, though those are only partially used since no such features are defined today anyway. These will be useful when determining whether to use new TLV-formatted onion hop datas when generating onions for peers.
2804fd3
to
617a680
Compare
ACK 617a680 |
Based on #428,this plumbs features through into Routes so that we can use that information when we go to construct onions. This is gonna be required for #430.While we're tracking new info in ChannelManager, we create the map that should eventually hold all channel data and implement #422.