Add API to regenerate index from CARv1 or CARv2 #309

masih · 2022-06-29T15:28:35Z

The index generation APIs either allowed reading an existing index from
a CARv2 or explicitly required a CARv1 to generate index.

Introduce APIs to make it easier for users that want to regenerate the
index regardless of whether it exists in a CAR file or not. The index
generation APIs are changed to accept either of the formats and
re-generate the index from the data payload unless ReadOrGenerate is
called.

Adjust the tests to run for all flavours of index generation with both
CARv1 and CARv2 payload.

The index generation APIs either allowed reading an existing index from a CARv2 or explicitly required a CARv1 to generate index. Introduce APIs to make it easier for users that want to regenerate the index regardless of whether it exists in a CAR file or not. The index generation APIs are changed to accept either of the formats and re-generate the index from the data payload unless `ReadOrGenerate` is called. Adjust the tests to run for all flavours of index generation with both CARv1 and CARv2 payload.

dirkmc

LGTM 👍

rvagg · 2022-06-30T03:00:35Z

v2/index_gen.go

+		if v2h.DataOffset < HeaderSize {
+			return fmt.Errorf("malformed CARv2; data offset too small: %d", v2h.DataOffset)
+		}
+		if v2h.DataSize < 1 {


I think a zero-length payload is still valid isn't it? it would just produce an empty index. not a big deal, zero payload probably means an error but strictly speaking I think 0 should be acceptable.

I thought about this when writing this statement. I actually think 1 might be too small. The rationale is that according to CARv2 spec the data size refers to the inner data payload size which should be a valid CARv1; as in the entire thing including the CARv1 header. At the danger of sounding pedantic (apologies if I do 🙂) accepting zero as DataSize would then mean an empty file is technically a valid CARv1?

That's why I think that value should be the minimum possible size for a CARv1, which is a CARv1 header with version value 1 and no roots which I think would come to 18 bytes.

WDYT?

Looks like the CARv1 should have at least one root CID, which increases the minimum acceptable CARv1 size.

The smallest CID I can think of is one with multihash code IDENTITY and empty data, which brings the total minimum CARv1 size to 26 bytes if my math is right.

Moving the discussion out to a separate issue: #310

masih marked this pull request as ready for review June 29, 2022 15:29

masih requested review from dirkmc, rvagg and willscott June 29, 2022 15:29

dirkmc approved these changes Jun 29, 2022

View reviewed changes

willscott approved these changes Jun 29, 2022

View reviewed changes

rvagg reviewed Jun 30, 2022

View reviewed changes

rvagg approved these changes Jun 30, 2022

View reviewed changes

masih mentioned this pull request Jun 30, 2022

Minimum DataSize value when validating CARv2 header #310

Open

masih merged commit 7ba9372 into master Jun 30, 2022

masih deleted the masih/idx-gen-ver-agnostic branch June 30, 2022 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API to regenerate index from CARv1 or CARv2 #309

Add API to regenerate index from CARv1 or CARv2 #309

masih commented Jun 29, 2022

dirkmc left a comment

rvagg Jun 30, 2022

masih Jun 30, 2022 •

edited

Loading

masih Jun 30, 2022 •

edited

Loading

masih Jun 30, 2022

Add API to regenerate index from CARv1 or CARv2 #309

Add API to regenerate index from CARv1 or CARv2 #309

Conversation

masih commented Jun 29, 2022

dirkmc left a comment

Choose a reason for hiding this comment

rvagg Jun 30, 2022

Choose a reason for hiding this comment

masih Jun 30, 2022 • edited Loading

Choose a reason for hiding this comment

masih Jun 30, 2022 • edited Loading

Choose a reason for hiding this comment

masih Jun 30, 2022

Choose a reason for hiding this comment

masih Jun 30, 2022 •

edited

Loading

masih Jun 30, 2022 •

edited

Loading