Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLV-based Variable Length Onion #434

Merged
merged 12 commits into from
Feb 11, 2020

Conversation

TheBlueMatt
Copy link
Collaborator

@TheBlueMatt TheBlueMatt commented Dec 29, 2019

Based on #433, this resolves #429 and #430!

@TheBlueMatt TheBlueMatt force-pushed the 2019-12-var-len-onion branch 4 times, most recently from 0c93d20 to 956a7a3 Compare January 2, 2020 18:57
@TheBlueMatt TheBlueMatt force-pushed the 2019-12-var-len-onion branch 5 times, most recently from d132a13 to d061182 Compare January 8, 2020 01:07
@TheBlueMatt TheBlueMatt force-pushed the 2019-12-var-len-onion branch 2 times, most recently from ebf74de to b341d68 Compare January 26, 2020 20:11
@jkczyz jkczyz self-requested a review January 27, 2020 18:28
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to get to the last three commits.

for payload in onion_payloads.drain(..) {
new_payloads.push(BogusOnionHopData::new(payload));
}
new_payloads[0].data[0] = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The significance of this is unclear to me. Is this the version byte? Can we use a constant for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its just to invalidate it, added a comment to note what it does.

Comment on lines 973 to 988
let read_pos = chacha_stream.read(&mut new_packet_data).unwrap();
chacha_stream.chacha.process_inline(&mut new_packet_data[read_pos..]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to encapsulate this into a single call to ChaChaReader?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...they're two pretty separate things...one is emptying the buffer of bytes we got from our peer, the other is filling any remaining bytes with 0-bytes encrypted. We could theoretically have a stream that reads from a given buffer and then returns 0s forever, but that just feels like magic that would make the code overly confusing. I did add a comment here noting wtf is going on cause I found it confusing as-is.

Comment on lines 23 to 31
let mut len = LengthCalculatingWriter(0);
encode_tlv!(&mut len, {
$(($type, $field)),*
});
VarInt(len.0 as u64).consensus_encode(WriterWriteAdaptor($stream))
.map_err(|e| if let Error::Io(ioe) = e { ioe } else { unreachable!() })?;
encode_tlv!($stream, {
$(($type, $field)),*
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this serializes each field four times -- twice when calculating the overall length and twice again when calculating the individual field lengths and values.

Would it be possible to devise an abstraction that adapts Writer in such a manner where the encoded value can be buffered before it is written? Then the length could be taken from it before flushing. This sort of adaptor behavior is alluded to in std::io::Write.

Or is there a tradeoff in doing it this way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I believe that is correct. Hopefully LLVM will be smart enough to optimize out the template writes here into something sensible, though I haven't gone and checked what actually gets generated.

I'm not sure how to do it much better, sadly, however, given the length tags are variable-length (have I mentioned how much I think variable length tags are a great way to make sure everything is maximally inefficient and insecure?) you're forced to at minimum do the whole thing twice, at without building a cache of lengths do it three times.

I did go ahead and write out the length calculation in the leng_prefixed macro so that its only 3, which I guess is at least a bit better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally feel your sentiment regarding variable-length tags. However, can't each field be encoded/serialized only once by keeping the serialized data in memory before writing it to the stream? That way the length could be computed and written first.

I guess that's why I was asking about tradeoffs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yea, we could do that, but I'd really rather not add extra allocations in our serialization pipeline. If LLVM optimizes away most of the calls as-is, then extra allocations would be performance suicide, and if it doesn't, hitting the heap is probably still worse than the alternative.

Comment on lines 51 to 57
if typ.0 == std::u64::MAX || typ.0 + 1 <= max_type {
Err(DecodeError::InvalidValue)?
}
$(if max_type < $reqtype + 1 && typ.0 > $reqtype {
Err(DecodeError::InvalidValue)?
})*
max_type = typ.0 + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The + 1 throughout here is a bit confusing. Is it needed because otherwise you couldn't support type 0 with this logic? Isn't std::u64::MAX a valid type, though?

The name max_type can be confusing, too. While it is the maximum type seen, the logic is such that it's really the minimum type that can be read next. Naming as min_next_type may read better with the logic. So I think this could be simplified and would read better as:

let mut last_seen_type: Option<u64> = None;
loop {
    let min_next_type = match last_seen_type {
        Some(x) => x + 1,
        None => 0,
    };

    // Types must be monotonically increasing.
    if typ.0 < min_next_type {
        Err(DecodeError::InvalidValue)?
    }

    // Required types must be found.
    $(if min_next_type <= $reqtype && typ.0 > $reqtype {
        Err(DecodeError::InvalidValue)?
    })*

    last_seen_type = Some(typ.0);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, indeed, I suppose u64::MAX is allowed since its odd :/. I think I was trying to be too clever...I took a variation on your proposal (because I think your proposal would panic with an addition overflow if last_seen_type == u64::MAX due to the +1 in the Some branch).

while self.read_len != self.max_len {
debug_assert!(self.read_len < self.max_len);
let mut buf = [0; 1024];
let readsz = cmp::min(1024, self.max_len - self.read_len) as usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readsz and read_len are a bit too similar names. How about the following?

read_len -> bytes_read
max_len -> length
readsz -> read_size

In particular, I think using bytes_read and length is clearer throughout both here and the implementation of read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure "length" is so clear to me. Went with the above but a full written out total_fixed_len instead of length.

Comment on lines 82 to 83
/// Essentially std::io::Take but a bit simpler and exposing the amount read at the end, cause we
/// may need to skip ahead that much at the end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this comment. In particular:

(1) how is it simpler?
(2) where is the amount read (read_len) used at the end? Do you mean internally when someone calls eat_remaining?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the text, let me know if its clearer now.

Comment on lines 85 to 87
pub read: R,
pub read_len: u64,
pub max_len: u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be public?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, definitely not. The first version didn't have eat_remaining and clients implemented that, but its better now :).

}, {
(6, short_id)
});
rd.eat_remaining().map_err(|_| DecodeError::ShortRead)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the caller really handle this? In other words, shouldn't there be something analogous to encode_varint_length_prefixed_tlv for decoding that reads the length, decodes, and discards any remaining bytes? (Or at very least it could be given the length seeing you need to condition on it.) Then FixedLengthReader could be hidden away as an implementation detail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure what you're suggesting here? Do you mean calling eat_remaining() in FixedLengthReader::drop() (which we can't do cause we have to sometimes return an Err)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying to hide FixedLengthReader behind a macro just as you are hiding LengthCalculatingWriter behind encode_varint_length_prefixed_tlv. Why should every user of decode_tlv need to do this error checking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future types, yea, we probably should. Sadly, the "length" got overloaded so that 0 implies the legacy format, which is somewhat confusing, so I don't see an obvious+generic way to do this. Happy to change it if you see some better alternative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the macros accept the length as a parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be refactored later

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I think in the future messages wont have a fixed length, they'll just contain a TLV and you will read to the end until you run out of TLV. This may be the only place that we use it, and, if not, I'm not sure what it'll look like. So I kinda agree with Arik.

Comment on lines 90 to 107
pub fn eat_remaining(&mut self) -> Result<(), ::std::io::Error> {
while self.read_len != self.max_len {
debug_assert!(self.read_len < self.max_len);
let mut buf = [0; 1024];
let readsz = cmp::min(1024, self.max_len - self.read_len) as usize;
self.read_exact(&mut buf[0..readsz])?;
}
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of rolling our own implementation, we can use std::io to simplify this as:

pub fn eat_remaining(self) -> Result<(), ::std::io::Error> {
    let mut remaining = self.read.take(self.max_len - self.read_len);
    let bytes_discarded = std::io::copy(&mut remaining, &mut std::io::sink())?;
    // ...
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Don't even have to create the take, we can just use self (though this did turn up this stream being super awkward, so I rewrote read() to be a bit more friendly to other std::io types.

Comment on lines 61 to 72
match last_seen_type {
Some(t) if typ.0 <= t => {
Err(DecodeError::InvalidValue)?
},
_ => {},
}
$(if (last_seen_type.is_none() || last_seen_type.unwrap() < $reqtype) && typ.0 > $reqtype {
Err(DecodeError::InvalidValue)?
})*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For someone coming across this code, I think having the comments from my example would go a long way in understanding that these two two blocks are for.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, totally, thats my bad.

@TheBlueMatt TheBlueMatt force-pushed the 2019-12-var-len-onion branch 2 times, most recently from 0a50756 to c91adf9 Compare January 30, 2020 19:13
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed until 3c9538f

Ok(msg) => {
let mut hmac = [0; 32];
if let Err(_) = chacha_stream.read_exact(&mut hmac[..]) {
return_err!("Unable to decode hop data", 0x4000 | 1, &[0;0]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, at least PERM|1 is well-specified for incorrect realm byte, here it's more a HMAC error so BADONION|PERM|5 ? (but dunno if we can use it for the hop_payload hmac)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC (and this may be untrue, so double check), BADONION generally is used to imply "the previous hop peer from me is broken" not "the thing the original sender sent is nonsense", which this is, so we don't want to set BADONION. We could leave it as invalid_realm since we can't reasonably call this a TLV onion payload either, but I don't know that it matters which.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"0x8000 (BADONION): unparsable onion encrypted by sending peer" (but don't say if sending peer is original sender or previous hop...).

You replaced it by PERM|22 on last tip ? I do think that BADONION|PERM|5 is the one which fits the best a) because it's the internal hmac so if wrong it's a failure of the original sender b) better for backward compatibility as this case can be triggered by old nodes too (and they shouldn't understand invalid_onion_payload).

But if you disagree with BADONION meaning interpretation I'm fine like that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe that's confusing, but certainly, at least for BADONION|5 and BADONION|6 its possible that the previous hop corrupted the onion before we got it (and we return differently - we pass the previous node a message to tell them to encrypt the failure onion).

In this case, this is exactly the thing that invalid_onion_payload should capture, I think - it says that we probably read the actual hop data in some bogus way and ended up without enough bytes left for the hmac.

Copy link

@ariard ariard Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm but still this HMAC is encrypted to the actual processing node, so the previous node may have tweaked the packet and its HMAC but we won't decrypt in this case. I still think this error is only triggerable by the original sender.

So yes we agree on the error case, it's just that BADONION is ambiguous..

@@ -906,20 +904,23 @@ impl<ChanSigner: ChannelKeys, M: Deref> ChannelManager<ChanSigner, M> where M::T
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side-note: to save some hash+ECDH operations, the version check should happen first. It's not going to protect against intentional DoSy behavior but if a buggy node keep sending us non-supported packet due to misinterpreting our features signaling that's better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it absolutely should not. That would expose a timing oracle as to whether or not the first byte of the ciphertext was X. It probably wouldn't explicitly break anything, but that's how we end up with Bleichenbacher attacks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the version byte isn't part of the ciphertext ? Bleichenbacher lies on the fact that's the padding is part of the ciphertext which isn't the case here.

I mention this at first because the order is described in the spec : "Upon receiving a packet, a processing node compares the version byte of the packet with its own supported versions and aborts the connection if the packet specifies a version number that it doesn't support" (packet forwarding)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misunderstood what you were suggesting - the comment was on a "}" so I have no context. Indeed, we already check the version before checking the HMAC (and return a differentiating error message anyway), so moving it forward should be fine, but also an existing issue that isnt changed here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry Github sucks for commenting something PR-related but not directly on changes themselves. We can do this later, that's why I mentioned it as side-note.

@@ -968,8 +970,10 @@ impl<ChanSigner: ChannelKeys, M: Deref> ChannelManager<ChanSigner, M> where M::T
})
} else {
let mut new_packet_data = [0; 20*65];
chacha.process(&msg.onion_routing_packet.hop_data[65..], &mut new_packet_data[0..19*65]);
chacha.process(&SIXTY_FIVE_ZEROS[..], &mut new_packet_data[19*65..]);
let read_pos = chacha_stream.read(&mut new_packet_data).unwrap();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Onion forward processing, is a bit tricky, could be better documented.

Like "Read leftover of incoming hops_data to 1300 0x00bytes. Then XOR rho-key pseudo-random stream (1300 - hop_data length)-bytes length to new buffer".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I added comments in response to Jeff's comments while you were posting this. Are you happy with it as it is now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can refactor it later, I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arik-so What could be refactored here?

@@ -249,6 +251,29 @@ mod real_chacha {
self.offset += count;
}
}

pub fn process_inline(&mut self, input_output: &mut [u8]) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you relying on LLVM judgment to inline this function ? Also this function process in-place, not inline, that's a bit confusing.. (or what the term exactly to mean you don't access heap for processing operation, can't remember)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less about bytecode optimization and more about not having to allocate an output buffer if we don't need the input any longer. In place is a much, much better name to describe that, so renamed this.

Comment on lines +151 to +167
fn tlv_reader(s: &[u8]) -> Result<(u64, u32, Option<u32>), DecodeError> {
let mut s = Cursor::new(s);
let mut a: u64 = 0;
let mut b: u32 = 0;
let mut c: Option<u32> = None;
decode_tlv!(&mut s, {(2, a), (3, b)}, {(4, c)});
Ok((a, b, c))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the test types and vectors from spec to cover the implemented behavior?

https://github.com/lightningnetwork/lightning-rfc/blob/master/01-messaging.md#appendix-b-type-length-value-test-vectors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good suggestion (they were added in a later pr, so I missed them). First of all, sadly they can't replace our tests since they don't have any tests for fields which we really want to require (I added a comment noting this).

Second of all, in doing so I found that the original TLV PR isn't the current spec, cause at some point they swapped the endianness of the VarInts (so, great, now we have variable-length integers, which suck, and also a particularly sucky version of variable-length integers, and one that you have to write your own code for and can't re-use the Bitcoin code....WTF?), see lightning/bolts#640 so I fixed that while I was at it, as well as tweaked the strictness of our decoder to match the test cases (eg rejecting TLVs that have extra data in a V that we otherwise think we know how to decode).

Finally, this rabbit hole pointed out that somehow we ended up with a second variable-length integer in the spec (?!) unique to TLVs since we already know the length, which I'd also neglected to implement, so that is done as well. Overall it seems like a somewhat nontrivial rewrite, sadly :(.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should submit a PR to the RFC spec to have its test vectors reflect the change in endianness.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC spec test vectors do match the current spec, I believe?

@@ -40,3 +142,73 @@ macro_rules! impl_writeable_len_match {
}
}
}

#[cfg(test)]
mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about tests for encoding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few basic ones, good point. Note that the fuzzers should have good coverage of "round-trip doesn't get the same result", but, indeed, not for "is the result even sane".

}, {
(6, short_id)
});
rd.eat_remaining().map_err(|_| DecodeError::ShortRead)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying to hide FixedLengthReader behind a macro just as you are hiding LengthCalculatingWriter behind encode_varint_length_prefixed_tlv. Why should every user of decode_tlv need to do this error checking?

@TheBlueMatt TheBlueMatt force-pushed the 2019-12-var-len-onion branch 2 times, most recently from 85b046a to a4e3d20 Compare February 3, 2020 02:39
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look at the new additions a bit more closely in another round (either later today or tomorrow).

}, {
(6, short_id)
});
rd.eat_remaining().map_err(|_| DecodeError::ShortRead)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the macros accept the length as a parameter?

@@ -69,7 +73,7 @@ impl InitFeatures {
/// Create a Features with the features we support
pub fn supported() -> InitFeatures {
InitFeatures {
flags: vec![2 | 1 << 5],
flags: vec![2 | 1 << 5, 1 << (9-8)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may wanna make these statements more reader-friendly at some point

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lets fix that in #440 or so. We're not making it worse here, and a followup would be good.

}, {
(6, short_id)
});
rd.eat_remaining().map_err(|_| DecodeError::ShortRead)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be refactored later

@@ -249,6 +251,29 @@ mod real_chacha {
self.offset += count;
}
}

pub fn process_in_place(&mut self, input_output: &mut [u8]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like if it can be processed both in place and out, one of the methods should call the other. Like non-in-place could take the input, clone it, and call process in place on the output before returning it to avoid code repetition. Just as an example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I dont see an obvious way to do so? The whole point of in_place is to be more efficient by skipping allocating a big buffer when we already have a perfectly good one, and trying to munge a single buffer into both input and output parameters in a generic version would trick up the borrow checker and not allow the compiler to perform as much optimization.

} }
}

macro_rules! decode_tlv {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think relying on macros for TLV codec is a bit too much magic, but like multiple other points above, these can be refactored later if we so choose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I can't say I super disagree, I am certainly the first one to dislike magic. I still think its maybe possible to implement using a TLV-stream-generating parser, but then you write out a ton of boilerplate matching on types and checking required types, which we'd probably end up macro'ing anyway :/.

Comment on lines +151 to +167
fn tlv_reader(s: &[u8]) -> Result<(u64, u32, Option<u32>), DecodeError> {
let mut s = Cursor::new(s);
let mut a: u64 = 0;
let mut b: u32 = 0;
let mut c: Option<u32> = None;
decode_tlv!(&mut s, {(2, a), (3, b)}, {(4, c)});
Ok((a, b, c))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should submit a PR to the RFC spec to have its test vectors reflect the change in endianness.

@TheBlueMatt TheBlueMatt force-pushed the 2019-12-var-len-onion branch from a4e3d20 to da52fed Compare February 10, 2020 22:06
read_len = reader.read(&mut buf[($len + read_len)..])?;
total_read_len += read_len;
}
if total_read_len == 0 || buf[$len] != 0 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0328440

If I understand correctly, "total_read_len == 0" means here record value is zero, but you should still encode its type instead of skip it ? (but not its length as its null too)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reading only the V in the TLV - so if we read 0 bytes, it means the L was (probably) 0, so the value is simply 0.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you still have to write down the type if it's a mandatory TLV like amt_to_forward and outgoing_cltv_value right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm? I'm confused what you're asking. This type is only read from a FixedLengthReader that is set to only return the TLV's value field. Further, this can only be used if it is either the last field in a V or the only field in a V, as otherwise there is ambiguity.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is a onion like tlv_payload_length | type-2 | type 4 a valid one if you have null amt_to_forward and outgoing_cltv_value's L and V ? If yes I don't understand if such HTLC makes sense..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its definitely a valid TLV - this reader (and the TLV tests) test reading a "0 length" such integer, which implies a value of 0.

Copy link

@ariard ariard Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we sanitize such 0-CLTV/msat HTLCs in send_htlcs which mean we may send garbage to our outgoing peer and this may trigger undefined behavior ?

Edit: we do HTLC fees check at least for msat decode_update_add_htlc_onion, checking the cltv_expiry is still a TODO in send_htlcs

msgs::OnionHopDataFormat::Legacy { short_channel_id } => short_channel_id,
msgs::OnionHopDataFormat::NonFinalNode { short_channel_id } => short_channel_id,
msgs::OnionHopDataFormat::FinalNode => {
return_err!("Final Node OnionHopData provided for us as an intermediary node", 0x4000 | 22, &[0;0]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a3f101d

Following BOLT4, "the erring node may include that type and its byte offset in the decrypted byte stream." so here we may return type 6 and offset of short_channel_id (but is this making sense because now you have to save offset for every type at deserialization in case you fail validation ?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type 6? The spec indicates that we could store where in the TLV stream we failed to read, and tell our peers that (I presume as a way to help debug errors, since the data isn't machine-respondable-to anyway). I'm super not gonna bother doing that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree don't bother just comment maybe we don't strictly adhere to the spec recommendation (which seems really loose in this case)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a recommendation, though, its MAY...we're still strictly adhering to the spec....I dont think we call it out in other places.

@TheBlueMatt TheBlueMatt force-pushed the 2019-12-var-len-onion branch from da52fed to ab4010a Compare February 10, 2020 23:07
@@ -968,8 +970,10 @@ impl<ChanSigner: ChannelKeys, M: Deref> ChannelManager<ChanSigner, M> where M::T
})
} else {
let mut new_packet_data = [0; 20*65];
chacha.process(&msg.onion_routing_packet.hop_data[65..], &mut new_packet_data[0..19*65]);
chacha.process(&SIXTY_FIVE_ZEROS[..], &mut new_packet_data[19*65..]);
let read_pos = chacha_stream.read(&mut new_packet_data).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arik-so What could be refactored here?

@TheBlueMatt TheBlueMatt force-pushed the 2019-12-var-len-onion branch 2 times, most recently from 3080ccf to bfc6855 Compare February 11, 2020 01:03
Previously OnionHopData contained a OnionRealm0HopData field however
instead of bumping the realm number, it has been replaced with a
length, used to indicte the length of a TLV-formatted object.

Because a TLV-formatted hop data can contain the same information as
a realm-0 hop data, we flatten the field and simply keep track of
what format it was in.
Its a bit awkward to have an hmac field covering the struct that
its in, and there is little difference in removing it, so just pull
it out and use a [u8; 32] where we care about the hmac.
This, as it should be, restricts OnionHopData to only being able to
represent valid states, while still allowing for tests to generate
bogus hop data fields to test deserialization.
This prepares for variable-length per-hop-data by wrapping the full
hop_data field in a decrypting stream, with a few minor
optimizations and redundant allocations to boot.
This adds a number of new stream adapters to track and/or calculate
the number of bytes read/written to an underlying stream, as well
as wrappers for the two (?!) variable-length integer types that TLV
introduces.
There's quite a bit of machinery included here, but it neatly
avoids any dynamic allocation during TLV deserialization, and the
calling side looks nice and simple. The macro-generated code is
pretty nice, though has some redundant if statements (I haven't
checked if they get optimized out yet, but I can't imagine they
don't).
@TheBlueMatt TheBlueMatt force-pushed the 2019-12-var-len-onion branch from bfc6855 to e70ee3b Compare February 11, 2020 18:52
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the feature comment but otherwise e70ee3b good to me

Err(DecodeError::InvalidValue)?
}
},)*
x if x % 2 == 0 => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c326061

More docs welcome : "if type is even MUST fail to parse the tlv_stream, if type is odd MUST discard the length next bytes"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the return value of UnknownRequiredFeature pretty clear?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahah right you mean I can't only read comment enum I have also to read the actual name of the enum

@@ -182,13 +182,21 @@ impl<T: sealed::Context> Features<T> {

pub(crate) fn requires_unknown_bits(&self) -> bool {
self.flags.iter().enumerate().any(|(idx, &byte)| {
( idx != 0 && (byte & 0x55) != 0 ) || ( idx == 0 && (byte & 0x14) != 0 )
(match idx {
0 => (byte & 0b00010100),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

53a3c62

Shouldn't the bit pattern be 0b01000100 ? We support option_upfront_shutdown_script but not gossip_queries?

Anyway add comments like "(00) data_loss_protect - (01) initial_routing_sync - (00) upfront_shutdown_script ..."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this (and other issues) are fixed in #440.

This implements the new TLV variable-length encoding for onion hop
data, opting to send it if the RouteHop's node_features indicates
support. It also uses the new process_inline method in ChaCha20 to
optimize a few things (though it grows a new TODO for a
probably-important optimization).
@TheBlueMatt TheBlueMatt force-pushed the 2019-12-var-len-onion branch from e70ee3b to bec0a26 Compare February 11, 2020 21:28
@ariard
Copy link

ariard commented Feb 11, 2020

Code review ACK bec0a26

@TheBlueMatt TheBlueMatt merged commit 3c9e8c9 into lightningdevkit:master Feb 11, 2020
@jkczyz
Copy link
Contributor

jkczyz commented Feb 18, 2020

Closed #430.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLV
4 participants