Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure the unpacked chunk is passed from the H2 CM peer to the H1 CM peer #628

Merged
merged 11 commits into from
Oct 16, 2024

Conversation

JamesPiechota
Copy link
Collaborator

@JamesPiechota JamesPiechota commented Oct 10, 2024

  1. refactor the code for reading the PoA from disk (and then going to peers if it can't be found)
  2. after finding a solution, ensure that the H2 CM peer reads the PoA - including the unpacked_chunk - before passing it off to the H1 CM peer

@JamesPiechota JamesPiechota changed the title WIP Ensure the unpacked chunk is passed from the H2 CM peer to the H1 CM peer Oct 13, 2024
@JamesPiechota JamesPiechota marked this pull request as ready for review October 13, 2024 22:13
@JamesPiechota JamesPiechota force-pushed the jp/cm-composite-20241009 branch from 197864c to f040e93 Compare October 14, 2024 00:24
wait_for_cross_node(_Miners, _ValidatorNode, _CurrentHeight, _ExpectedPartitions, 0) ->
?assert(false, "Timed out waiting for a cross-node solution");
wait_for_cross_node(_Miners, _ValidatorNode, _CurrentHeight, _ExpectedPartitions, 0) ->
?assert(false, "Cross-node solutions can only have 2 partitions.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A check for ExpectedPartitions' length missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Thanks! fixed

{modules_covering_recall_byte, ModuleIDs}]),
ar:console("WARNING: we have mined a block but did not find the PoA1 proofs "
"locally - searching the peers...~n"),
case fetch_poa_from_peers(RecallByte1, PackingDifficulty) of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove fetch_poa_from_peers from here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it all happens in prepare_poa now up here: https://github.com/ArweaveTeam/arweave/pull/628/files#diff-ea8e957c63acb7497b0ed4f9201b40f18e5d5b4fa371e67b4cff05850e4df854R646

We had 3 spots where we potentially needed to read or fetch poa when preparing a solution poa1, preparing a solution poa2, and before sending an H2 candidate to a CM peer. Previously the 3rd case was missing the read and fetch. So the intent was to pull the logic together into a single function (prepare_poa) and then call that function from each of the 3 spots that need to read/fetch the poa.

that was the intent at least...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I see, thank you

prepare_solution(poa1, Candidate, Solution#mining_solution{ poa2 = PoA2 });
{error, Error} ->
Chunk2Binary = case Chunk2 of
not_set ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a realistic case that we do not have the second chunk and leave it to the exit peer to fetch it, does it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, that case statement was preserving the existing logic but fixing a crash. the previous prepare_solution(poa2 call didn't check whether Chunk2 was set or not and just sent it directly to the exit peer here: https://github.com/ArweaveTeam/arweave/pull/628/files#diff-ea8e957c63acb7497b0ed4f9201b40f18e5d5b4fa371e67b4cff05850e4df854L773

Unfortunately when Chunk2 was not_set it caused a crash during ar_serialize. So the above change just fixes that ar_serialize crash, but doesn't change the overall logic of deferring to the exit peer (which I agree, doesn't buy us much if it's the Chunk data that's missing rather than proof data)

That said I didn't fully investigate why chunk2 was not_set in the first place... so unclear to me how we get to this code path anyways. The case statement "fix" was just in response to the stack traces shared by miners which were showing a crash in ar_serialize due to the poa.chunk value being not_set rather than a binary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, interesting

{error, Error} ->
case Chunk1 of
not_set ->
Packing = ar_block:get_packing(PackingDifficulty, MiningAddress),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR did not introduce it (I did here,) but it does not look quite right - we should wrap the Solution#mining_solution{ poa1 = #poa{ chunk = SubChunk } } below in may_be_leave_it_to_exit_peer to make sure it errors our on exit peer because there are no proofs, only the raw chunk. Should we fix it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this?

eaeffd6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you!

@JamesPiechota JamesPiechota merged commit 7ef9f2b into master Oct 16, 2024
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants