Feature/packing difficulty #590

ldmberman · 2024-07-16T21:37:39Z

No description provided.

JamesPiechota · 2024-08-07T00:43:25Z

apps/arweave/c_src/ar_mine_randomx.c

+		for (int k = 0; k < offsetByteSize; k++) {
+			offsetBytes[k] = ((offset + subChunkSize) >> (8 * (offsetByteSize - 1 - k))) & 0xFF;
+		}
+		SHA256_CTX sha256;


Is this a correct understanding of this block?

It computes a feistel hash which is used as the encryption key for the sub-chunk

the feistel hash is generated from the packing address (i.e. inputData) and the offset of the sub-chunk

this guarantees that the encryption key for each sub-chunk is unique but deterministic

If that's correct maybe we could add something like that as a comment? e.g.

// Sub-chunk encryption key is the feistel hash of the input data and the sub-chunk offset

It is a sha2 hash (not a feistel)

haha, my ignorance is showing. I was going off of pattern matching with the feistel_hash() function: https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/c_src/feistel_msgsize_key_cipher.cpp#L7-L13

The code blocks look pretty similar - is the main difference that a feistel hash forces a 32-length? (or is it just a misnaming of the feistel_hash(...) function and I should ignore?)

The block computes sha256(<< packing_key_without_sub_chunk_offset, sub_chunk_offset >>).

JamesPiechota · 2024-08-07T00:46:54Z

apps/arweave/c_src/ar_mine_randomx.c

+		// 3 bytes is sufficient to represent offsets up to at most MAX_CHUNK_SIZE.
+		int offsetByteSize = 3;
+		unsigned char offsetBytes[offsetByteSize];
+		for (int k = 0; k < offsetByteSize; k++) {


Suggested change

for (int k = 0; k < offsetByteSize; k++) {

// Byte string representation of the sub-chunk offset: i * subChunkSize

for (int k = 0; k < offsetByteSize; k++) {

JamesPiechota · 2024-08-07T00:49:51Z

apps/arweave/c_src/ar_mine_randomx.c

+		SHA256_Update(&sha256, offsetBytes, offsetByteSize);
+		SHA256_Final(key, &sha256);
+
+		for (int j = 0; j < iterations; j++) {


Suggested change

for (int j = 0; j < iterations; j++) {

// Sequentially encrypt each sub-chunk 'iterations' times.

// The encrypted output of each iteration is the input for the following iteration.

for (int j = 0; j < iterations; j++) {

JamesPiechota · 2024-08-07T00:51:18Z

apps/arweave/c_src/ar_mine_randomx.c

+	uint32_t subChunkSize = MAX_CHUNK_SIZE / subChunkCount;
+	uint32_t offset = 0;
+	unsigned char key[PACKING_KEY_SIZE];
+	for (int i = 0; i < subChunkCount; i++) {


Suggested change

for (int i = 0; i < subChunkCount; i++) {

// Encrypt each sub-chunk independently and then concatenate the encrypted sub-chunks to yield encrypted composite chunk

for (int i = 0; i < subChunkCount; i++) {

JamesPiechota · 2024-08-07T01:04:51Z

apps/arweave/c_src/ar_mine_randomx.c

+	uint32_t subChunkSize = outChunkLen / subChunkCount;
+	uint32_t offset = 0;
+	unsigned char key[PACKING_KEY_SIZE];
+	for (int i = 0; i < subChunkCount; i++) {


Suggested change

for (int i = 0; i < subChunkCount; i++) {

// Decrypt each sub-chunk independently and then concatenate the decrypted sub-chunks to yield encrypted composite chunk

for (int i = 0; i < subChunkCount; i++) {

JamesPiechota · 2024-08-07T01:05:42Z

apps/arweave/c_src/ar_mine_randomx.c

+	unsigned char* decryptedChunk = enif_make_new_binary(envPtr, outChunkLen,
+			&decryptedChunkTerm);
+	unsigned char* decryptedSubChunk;
+	// Both MAX_CHUNK_SIZE and subChunkCount are multiples of 64 so all sub-chunks


Suggested change

// Both MAX_CHUNK_SIZE and subChunkCount are multiples of 64 so all sub-chunks

// Both outChunkLen and subChunkCount are multiples of 64 so all sub-chunks

JamesPiechota · 2024-08-07T01:11:03Z

apps/arweave/c_src/ar_mine_randomx.c

+		iterations < 1) {
+		return enif_make_badarg(envPtr);
+	}
+	if (!enif_get_int(envPtr, argv[8], &subChunkCount) ||


Do we also need to test that subChunkCount is a multiple of 64? e.g. I believe a subChunkCount of 2 would pass these checks but would then run afoul of this comment // Both MAX_CHUNK_SIZE and subChunkCount are multiples of 64

Or, maybe that comment just needs to be updated - since we don't care if subChunkCount is a multiple of 64 so long as all sub chunks are the same size and the sub-chunks are multiple of 64

JamesPiechota · 2024-08-07T01:13:53Z

apps/arweave/c_src/ar_mine_randomx.c

+		for (int k = 0; k < offsetByteSize; k++) {
+			offsetBytes[k] = ((offset + subChunkSize) >> (8 * (offsetByteSize - 1 - k))) & 0xFF;
+		}
+		SHA256_CTX sha256;


Suggested change

SHA256_CTX sha256;

// Sub-chunk encryption key is the feistel hash of the input data and the sub-chunk offset

SHA256_CTX sha256;

JamesPiechota · 2024-08-07T01:14:17Z

apps/arweave/c_src/ar_mine_randomx.c

+		SHA256_Update(&sha256, offsetBytes, offsetByteSize);
+		SHA256_Final(key, &sha256);
+
+		for (int j = 0; j < iterations; j++) {


Suggested change

for (int j = 0; j < iterations; j++) {

// Sequentially encrypt each sub-chunk 'iterations' times.

// The encrypted output of each iteration is the input for the following iteration.

for (int j = 0; j < iterations; j++) {

JamesPiechota · 2024-08-07T01:14:44Z

apps/arweave/c_src/ar_mine_randomx.c

+		// 3 bytes is sufficient to represent offsets up to at most MAX_CHUNK_SIZE.
+		int offsetByteSize = 3;
+		unsigned char offsetBytes[offsetByteSize];
+		for (int k = 0; k < offsetByteSize; k++) {


Suggested change

for (int k = 0; k < offsetByteSize; k++) {

// Byte string representation of the sub-chunk offset: i * subChunkSize

for (int k = 0; k < offsetByteSize; k++) {

apps/arweave/c_src/ar_mine_randomx.c

JamesPiechota · 2024-08-07T01:41:34Z

apps/arweave/include/ar.hrl

+%% The number of sub-chunks in a compositely packed chunk with the packing difficulty
+%% between 1 and 32 incl. The composite packing with the packing difficulty 1 matches
+%% approximately the non-composite 2.6 packing in terms of computational costs.
+-define(PACKING_DIFFICULTY_ONE_SUB_CHUNK_COUNT, 32).


The number of sub-chunks is the same for every packing difficulty, right? i.e. this isn't specifically about "Packing Difficulty 1" right? Or have I misunderstood?

I.e. could this be called COMPOSITE_PACKING_SUB_CHUNK_COUNT or something and still be accurate?

Hm, okay I see elsewhere we use the term "atomic sub-chunk" or "basic atomic sub-chunk" - are those different from a "packing difficulty 1 sub chunk"?

if they're all the same thing, what do you think about just labeling them all "sub chunk" and dropping the qualifiers - I don't think we have the concept of a sub-chunk anywhere else in the code yet, right?

JamesPiechota · 2024-08-07T01:49:06Z

apps/arweave/src/ar.erl

@@ -98,12 +98,13 @@ show_help() ->
 					"a particular data range. The data and metadata related to the module "
 					"are stored in a dedicated folder "
 					"([data_dir]/storage_modules/storage_module_[partition_number]_[packing]/"
-					") where packing is either a mining address or \"unpacked\"."
-					" Example: storage_module 0,En2eqsVJARnTVOSh723PBXAKGmKgrGSjQ2YIGwE_ZRI. "
+					") where packing is either a mining address + : + packing difficulty "


Suggested change

") where packing is either a mining address + : + packing difficulty "

") where packing is either <mining address>:<packing difficulty> "

apps/arweave/include/ar.hrl

apps/arweave/src/ar_bench_packing.erl

JamesPiechota · 2024-08-07T21:30:12Z

apps/arweave/include/ar.hrl

@@ -359,7 +376,8 @@
 	%% of the corresponding transaction. Proofs the inclusion of the chunk
 	%% in the corresponding "data_root" under a particular offset.
 	data_path = <<>>,
-	chunk = <<>>
+	chunk = <<>>,


Suggested change

chunk = <<>>,

%% When packing difficulty is 0 chunk stores a full packed chunk

%% When packing difficulty >= 1, chunk stores a 8192-byte packed sub-chunk

chunk = <<>>,

JamesPiechota · 2024-08-07T21:31:28Z

apps/arweave/include/ar.hrl

@@ -359,7 +376,8 @@
 	%% of the corresponding transaction. Proofs the inclusion of the chunk
 	%% in the corresponding "data_root" under a particular offset.
 	data_path = <<>>,
-	chunk = <<>>
+	chunk = <<>>,
+	unpacked_chunk = <<>>


Suggested change

unpacked_chunk = <<>>

%% When packing difficulty is 0, unpacked_chunk remains <<>>

%% When packing difficulty >= 1, unpacked_chunk stores a full unpacked chunk

unpacked_chunk = <<>>

JamesPiechota · 2024-08-08T21:33:31Z

apps/arweave/src/ar_block.erl

@@ -526,7 +605,9 @@ hash_wallet_list(WalletList) ->
 						(ar_serialize:encode_bin(LastTX, 8))/binary,
 						(ar_serialize:encode_int(Denomination, 8))/binary,
 						MiningPermissionBin/binary >>,
-				crypto:hash(sha384, Preimage)
+				Alg = case Height >= ar_fork:height_2_8() of true -> sha256;


I forget the context on this change. Is this required for the packing depth changes, or we taking the opportunity to make this change since we have to do a hard fork anyways?

And the rationale is to reduce the size of the wallet list?

apps/arweave/src/ar_block_pre_validator.erl

JamesPiechota · 2024-08-09T01:32:10Z

apps/arweave/src/ar_packing_server.erl

+			ChunkOffset, TXRoot, Chunk, ChunkSize, RandomXStateRef, External)
+		when RequestedAddr == StoredAddr,
+			StoredPackingDifficulty > RequestedPackingDifficulty ->
+	repack_no_nif({RequestedPacking, StoredPacking, ChunkOffset, TXRoot, Chunk,


These all use repack_no_nif because the nif doesn't handle repacking across formate (e.g. legacy to composite, composite to legacy, or changing composite difficulties)?

Yes, except for the current NIF implementation can change the packing difficulty up (make it more difficult) but not down.

JamesPiechota · 2024-08-09T01:34:41Z

apps/arweave/src/ar_http_iface_client.erl

+								{<<"bucketsize">>, BucketSize},
+								{<<"addr">>, EncodedAddr},
+								{<<"pdiff">>, PackingDifficulty}
+							]} when is_integer(PackingDifficulty), PackingDifficulty >= 0,


Should this check >= 1? It looks like we only serialize to {pdiff, xxx} if the difficlty is >= 1, so if we get a 0 here it might indicate some weird error?

JamesPiechota · 2024-08-09T01:37:12Z

apps/arweave/src/ar_http_iface_middleware.erl

 					Indices = collect_missing_tx_indices(Prefixes),
-					IsSolutionHashKnown =


Why are we removing all this code?

This one might be questionable, but I felt like this feature is more trouble maintaining than it is actually useful. The feature is about the validators optionally informing the block sender they have the chunks so that some bandwidth is saved while the validator reads and packs the chunk(s) themselves.

Thinking about it now, I believe the feature might be in fact hurtful in the long run especially given the composite packing specifics because it may make miners rely excessively on validators extracting the chunks and if such validators disappear over time the network might not notice it and might not react sufficiently quickly. In other words, I would rather make the network spend some bandwidth but require all miners to set up the chunk building correctly from the start.

works for me! I'm a fan of removing code.

JamesPiechota · 2024-08-14T01:36:44Z

apps/arweave/src/ar_block.erl

+get_max_nonce(0) ->
+	get_max_nonce2((?RECALL_RANGE_SIZE) div ?DATA_CHUNK_SIZE);
+get_max_nonce(PackingDifficulty) ->
+	AdjustedRecallRangeSize = ?RECALL_RANGE_SIZE div PackingDifficulty,


Suggested change

AdjustedRecallRangeSize = ?RECALL_RANGE_SIZE div PackingDifficulty,

AdjustedRecallRangeSize = get_recall_range(PackingDifficulty),

JamesPiechota · 2024-08-14T01:41:01Z

apps/arweave/src/ar_block_pre_validator.erl

@@ -510,17 +566,29 @@ pre_validate_cumulative_difficulty(B, PrevB, SolutionResigned, Peer) ->
 					gen_server:cast(?MODULE, {enqueue, {B, PrevB, true, Peer}}),
 					enqueued;
 				false ->
-					pre_validate_quick_pow(B, PrevB, false, Peer)
+					pre_validate_packing_difficulty(B, PrevB, false, Peer)


Is this ever called with SolutionResigned == true?

Oh, good catch! This is wrong, fixed

JamesPiechota · 2024-08-14T01:44:29Z

apps/arweave/src/ar_chunk_storage.erl

@@ -34,6 +34,9 @@

 encode_packing({spora_2_6, Addr}) ->
 	"spora_2_6_" ++ binary_to_list(ar_util:encode(Addr));
+encode_packing({composite, Addr, PackingDifficulty}) ->
+	"composite_" ++ binary_to_list(ar_util:encode(Addr)) ++ ":"


Flagging this discussion here (was started in the other repo). @vird pointed out that : is used for address checksum and it will cause a problem on windows (eventually). So perhaps we can select a different delimeter?

Slack discussion here: https://digitalhistor-8tx7838.slack.com/archives/C04Q93X02AJ/p1723576491561449

Maybe we use . as the delimiter?

I like .!

The commit introduces the new hard fork (2.8), the height is not set yet. Composite packing allows to extract more performance out of bigger-capacity lower-bandwidth drives by choosing a costlier packing that reduces the required mining bandwidth.

some bug fixes and some refactoring Co-authored-by: James Piechota <piechota@gmail.com>

Remove old randomx nifs: bulk_hash_fast_nif hash_fast_verify_nif hash_fast_long_with_entropy_nif bulk_hash_fast_long_with_entropy_nif vdf_parallel_sha_verify_nif

- Fix a bug where the H1 peer receives H2 and builds incorrect solution after fetching proofs from the network instead of the local storage; - Do not give up solutions when a CM miner cannot find proofs or VDF steps - delegate it to the exit peer; - Fix incorrect sub-chunk picking during solution validation by the mining server.

Adjusting the hash rate by (TwoChunkCount + OneChunkCount) / TwoChunkCount counts the number of all the hashing attempts. In other words, the adjusted value is useful when we want to see the total amount of CPU work put into mining a block. This is not what we use it for. We use it as a denominator when computing a share contributed by a single partition - see ar_pricing:get_v2_price_per_gib_minute. Therefore, the hash rate computed here needs to have the same "units" as the hash rate we estimate for the partition - the "normalized" hash rate where a recall range only produces 4 nonces from one recall range (chunk-1) plus up to 400 nonces (chunk-2). Note that we did not even adjust it by (TwoChunkCount + OneChunkCount) / TwoChunkCount but by 100 div (100 + 1), what is wrong.

- Move Fast/Light flag into the randomx state now we only need the _fast vs _light methods on init - Add randomx_info_nif and update lib/RandomX submodule randomx_info_nif provides some info about a RandomX state - currently only used in tests. Updated lib/RandomX submodule exposts randomx dataset and cache size values via -D build flags - Change up how we handle DEBUG vs. non-DEBUG for ar_mine_randomx.erl No longer us -define, instead switch behavior based on the randomx State. This will let us test the non-DEBUG functionality for some more code paths - Add more ar_mine_randomx tests: 1. Test for a memory corruption issue in the decrypt_composite_nif 2. Tests for the NIF wrappers - Remove the reencrypt_legacy_composit nif in preparation for increasing the randomx dataset for composite packing - Remove obsolete c_src tests binary - Remove ar_randomx_state

- Applies to composite packing only; - The composite packing with packing difficulty=2 has the recall range of 25 MiB, the packing difficulty=3 - 12.5 MiB etc.

Build two versions of the randomx NIFs one for the 512MIB dataset (rx512) and one for the 4096MiB dataset (rx4096)

On MacOS the symbols in each of the statically linked librandomx.a libraries conflicted at runtime causing error or segfaults. The fix is to build the libraries with the -fvisibility=hidden flag. This prevents the symbols from being exported beyond the .so that the libraries are linked into.

NODE_NAME='blah@127.0.0.1' ./bin/start xxxx

ldmberman requested review from vird and JamesPiechota July 16, 2024 21:37

ldmberman force-pushed the feature/packing-difficulty branch from d723102 to eda26c0 Compare August 1, 2024 09:46

JamesPiechota reviewed Aug 7, 2024

View reviewed changes

apps/arweave/c_src/ar_mine_randomx.c Outdated Show resolved Hide resolved

JamesPiechota reviewed Aug 7, 2024

View reviewed changes

apps/arweave/include/ar.hrl Outdated Show resolved Hide resolved

JamesPiechota reviewed Aug 7, 2024

View reviewed changes

apps/arweave/src/ar_bench_packing.erl Outdated Show resolved Hide resolved

JamesPiechota reviewed Aug 7, 2024

View reviewed changes

JamesPiechota reviewed Aug 8, 2024

View reviewed changes

apps/arweave/src/ar_block_pre_validator.erl Show resolved Hide resolved

JamesPiechota reviewed Aug 8, 2024

View reviewed changes

apps/arweave/src/ar_block_pre_validator.erl Show resolved Hide resolved

JamesPiechota reviewed Aug 8, 2024

View reviewed changes

apps/arweave/src/ar_block_pre_validator.erl Show resolved Hide resolved

JamesPiechota reviewed Aug 8, 2024

View reviewed changes

apps/arweave/src/ar_block_pre_validator.erl Show resolved Hide resolved

JamesPiechota reviewed Aug 9, 2024

View reviewed changes

JamesPiechota reviewed Aug 14, 2024

View reviewed changes

ldmberman force-pushed the feature/packing-difficulty branch 3 times, most recently from 8af401a to 77af830 Compare August 15, 2024 12:50

ldmberman force-pushed the feature/packing-difficulty branch from 7fb7006 to 2710c1f Compare August 28, 2024 09:37

ldmberman force-pushed the feature/packing-difficulty branch from b9f077c to 8a9ca76 Compare September 5, 2024 17:00

Lev Berman and others added 20 commits September 20, 2024 20:41

Add support for composite packing

af6859c

The commit introduces the new hard fork (2.8), the height is not set yet. Composite packing allows to extract more performance out of bigger-capacity lower-bandwidth drives by choosing a costlier packing that reduces the required mining bandwidth.

Update testnet configuration for 2.8 testing

7e82649

Update benchmark-packing (#605)

9de7915

some bug fixes and some refactoring Co-authored-by: James Piechota <piechota@gmail.com>

Improve account tree performance tests

6671ee3

Filter discovered peers by network

a7064e6

Remove old randomx nifs: (#607)

b0a054a

Remove old randomx nifs: bulk_hash_fast_nif hash_fast_verify_nif hash_fast_long_with_entropy_nif bulk_hash_fast_long_with_entropy_nif vdf_parallel_sha_verify_nif

Use . as addr-packing diff separator

8eaebee

Document poa chunk, unpacked_chunk fields

14e2e90

Some composite packing fixes

2b39e45

Fix data clobber in the decrypt_composite_chunk_nif

18733e8

Reduce recall range size to 50 MiB

4401f14

- Applies to composite packing only; - The composite packing with packing difficulty=2 has the recall range of 25 MiB, the packing difficulty=3 - 12.5 MiB etc.

Drop support for legacy packing in 4 years

8b1c94a

Build two versions RandomX

4e7ca13

Build two versions of the randomx NIFs one for the 512MIB dataset (rx512) and one for the 4096MiB dataset (rx4096)

Use the two RandomX states in ar_packing_server and ar_block

25acd8b

Expose arweave_release as a metric

1f2d728

Allow user to specify node name on launch

50a0afe

NODE_NAME='blah@127.0.0.1' ./bin/start xxxx

JamesPiechota force-pushed the feature/packing-difficulty branch from d42664f to 50a0afe Compare September 20, 2024 20:41

JamesPiechota merged commit 5e77ca1 into master Sep 20, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/packing difficulty #590

Feature/packing difficulty #590

ldmberman commented Jul 16, 2024

JamesPiechota Aug 7, 2024

ldmberman Aug 7, 2024

JamesPiechota Aug 7, 2024

ldmberman Aug 28, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 7, 2024

JamesPiechota Aug 8, 2024

JamesPiechota Aug 9, 2024

ldmberman Aug 28, 2024

JamesPiechota Aug 9, 2024

JamesPiechota Aug 9, 2024

ldmberman Aug 28, 2024

JamesPiechota Aug 28, 2024

JamesPiechota Aug 14, 2024

JamesPiechota Aug 14, 2024

ldmberman Aug 28, 2024

JamesPiechota Aug 14, 2024

ldmberman Aug 28, 2024

	for (int k = 0; k < offsetByteSize; k++) {
	// Byte string representation of the sub-chunk offset: i * subChunkSize
	for (int k = 0; k < offsetByteSize; k++) {

	for (int i = 0; i < subChunkCount; i++) {
	// Encrypt each sub-chunk independently and then concatenate the encrypted sub-chunks to yield encrypted composite chunk
	for (int i = 0; i < subChunkCount; i++) {

	for (int i = 0; i < subChunkCount; i++) {
	// Decrypt each sub-chunk independently and then concatenate the decrypted sub-chunks to yield encrypted composite chunk
	for (int i = 0; i < subChunkCount; i++) {

	// Both MAX_CHUNK_SIZE and subChunkCount are multiples of 64 so all sub-chunks
	// Both outChunkLen and subChunkCount are multiples of 64 so all sub-chunks

	SHA256_CTX sha256;
	// Sub-chunk encryption key is the feistel hash of the input data and the sub-chunk offset
	SHA256_CTX sha256;

	") where packing is either a mining address + : + packing difficulty "
	") where packing is either <mining address>:<packing difficulty> "

		Indices = collect_missing_tx_indices(Prefixes),
		IsSolutionHashKnown =

	AdjustedRecallRangeSize = ?RECALL_RANGE_SIZE div PackingDifficulty,
	AdjustedRecallRangeSize = get_recall_range(PackingDifficulty),

Feature/packing difficulty #590

Feature/packing difficulty #590

Conversation

ldmberman commented Jul 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment