"kernel activity hash": add some amount of swingset state into the cosmos AppHash #3442

warner · 2021-07-01T08:27:28Z

What is the Problem Being Solved?

In trying to debug #3428, @mhofman and I compared slog traces of two different validators to confirm that they were running the same cranks. He found variation in the vbank messages (#3435), and we also observed differences in the GC Actions sent into vats between validators which had or had not reloaded a vat from snapshot (due to #3428 causing different GC syscalls to be made slightly earlier).

We've been kind of on the fence about how much variation to allow in the swingsets running on different validators. Currently, the only swingset variations that will cause a validator to break consensus are outbound comms/IBC messages (written into the cosmos-sdk "swingset" Module's state) and messages sent to the vbank Module (also saved in its state). Anything else will silently diverge until one of those two values observes a difference.

It's probably best to detect these kernel-level variations as quickly as possible. Ideally we'd store all consensus-sensitive kernel state in the cosmos DB where it can be rolled up into the AppHash along with everything else. We tried this early on, and found the performance was troublesome, but maybe we should consider trying it again.

Description of the Design

We may benefit from a partial approach, something like:

at the beginning of controller.step(), read a "previous kernel state hash" value from the cosmos DB
hash it and the next run-queue message together
store the result back to the DB

That would at least be sensitive to deviations in the deliveries that the kernel is performing. This would have caught the GC Action variations we observed.

The next level would be to hash all the DB changes made during a block (the contents of the "block buffer") and fold them in as well. It wouldn't provide any insight into what the variance was, but it would detect it immediately. We've talked about excluding certain kernelDB keys from consensus (vat snapshot ID, transcript prefix offset); we'd need to implement that before adding the block buffer into the hash.

Consensus Considerations

This would be a consensus-breaking change. If we choose to implement it, we should do so before our first release.

Security Considerations

Most chain environments are entirely transparent, but if we anticipate some sort of hybrid model in which the swingset state is somehow concealed from the host-side application, we must consider the information leakage of this activity hash. This leakage can be prevented by including a full-strength random value (256 bits or more) into each hash computation, and not revealing this value to the host application.

Test Plan

A unit test which somehow modifies the kernel between two replays of the same state (maybe by swapping two items on the run-queue), and verifies that the two instances finish with different kernel activity hash values.

The text was updated successfully, but these errors were encountered:

warner · 2021-07-14T00:48:59Z

I'm adding this to the Metering Phase, but since adding this feature will expose any existing non-determinism pretty quickly (and we're already finding some from the previous phase), it's really important that we do extensive multi-validator multi-restart testing before the start of that phase.

warner · 2021-07-17T16:59:06Z

It occurred to me that we don't even need to do any hashing from the kernel side. Simply writing the block buffer into a single key of the cosmos DB at the end of each block should suffice, because it will be sampled immediately (it will change the AppHash for that same block).

That might consume more space in the state vector, but it will be replaced on the next block, so it won't accumulate (except for archival nodes, which record the full history of the state vector). Recording a hash of the current block buffer, instead of the full contents, would reduce that.

The mailbox device tracking the highest inbound message number and ack for each peer, to de-duplicate repeated messages, so it could reduce the amount of kernel activity. Each call to `deliverInbound` would return a boolean to indicate whether the messages/ack were new, and thus the kernel needed to be cycled. However, the device was holding this tracking data in non-durable state, so if/when the kernel was restarted, the state would be lost. A duplicate message/ack arriving in the restarted process would trigger kernel activity that would not have run in the original process. These extra cranks caused diverge between validators when one of them was restarted, and the client sent a duplicate message (such as the pre-emptive `ack` all clients send at startup). The extra crank does not get very far, because vattp does its own deduplication, so the divergence was only visible in the slog. But when #3442 is implemented, even a single extra crank will flag the validator as out of consensus. The fix is to remove the mailbox device's dedup code, and rely upon vattp for this function. The test was also updated to match, and a new test (comparing two parallel kernels, one restarted, one not) was added. closes #3471

warner · 2021-07-30T07:37:20Z

My current idea for the design:

controller.js is responsible for building a SHA256 hash function (out of the Node.js stdlib crypto module) and passing it as a kernelEndowment
the kernel passes this into buildCrankBuffer
the crank buffer's commitCrank is updated to:
- sort all keys of additions lexicographically
- read a special crankhash key
- initialize a SHA256 hasher with crankhash (as 32 hex characters)
- for each addition, update the hasher with add ${key} ${value}
- then do the kvStore.set as usual
- then sort all keys of deletions lexicographically
- then for each deletion, update the hasher with delete ${key}
- then do the kvStore.delete as usual
- then finish the hasher, retrieving the hashed output
- then do kvStore.set('crankhash', crankhash) (as 32 hex characters)
the crankhash key will be stored as hex, since the kvstore is a string-to-string table
the crankBuffer, kernel, and controller APIs will be extended with a getCrankHash() method, which returns kvStore.get('crankhash') (as 32 hex characters)
in the host application, the end-of-block sequence should look like:

await controller.run(runPolicy);
swingstore.commit();
const crankHash = controller.getCrankHash();
appState.save('swingset-crankhash', crankHash);
appState.commit();

In addition, the initializeSwingSet function (which initializes the database) should be updated to save an empty string into crankhash.

The host application (which uses swingset as a library) is obligated to provide a "hostStorage" object, which is a trio of kvStore, streamStore, and snapStore. The host typically uses openLMDBSwingStore() to create this, and holds onto the "commit block" facet. Previously the kernelKeeper was built on top of the streamStore, the snapStore, and an "enhancedCrankBuffer", which was the host-provided key-value store plus an abortable/committable "crank buffer", plus some extra convenience methods. The kernel held the abort/commit facet. The kernel was responsible for extracting the raw kvStore from `hostStorage`, wrapping it, and passing the enhanced version into `makeKernelKeeper` along with the other two values. Instead, we now give the entire "hostStorage" object to `makeKernelKeeper`, and give the kernel keeper responsibility for doing the wrapping. When the kernel wants to commit or abort, it calls `kernelKeeper.abortCrank()` or `commitCrank()`. This simplifies the `makeKernelKeeper` construction and will provide a place for `commitCrank` to store a hash of kernel activity. refs #3442

The multiple members of a consensus machines are supposed to perform identical exection of every crank. To detect any possible divergence as quickly as possible, the kernel maintains the "crankhash": a constantly-updated string which incorporates (by SHA256 hash) a copy of every DB write and delete. `controller.getCrankHash()` can be run after one or more cranks have finished, and the resulting hex string can be e.g. stored in a host application consensus state vector. If two members diverge in a way that causes their swingset state to differ, they will have different crankhashes, and the consensus state vectors will diverge. This should cause at least one of them to fall out of consensus. Some keys are excluded from consensus: currently just those involving vat snapshots and the truncation (non-initial starting point) of the transcript. refs #3442

The multiple members of a consensus machine are supposed to perform identical exection of every crank. To detect any possible divergence as quickly as possible, the kernel maintains the "crankhash": a constantly-updated string which incorporates (by SHA256 hash) a copy of every DB write and delete. `controller.getCrankHash()` can be run after one or more cranks have finished, and the resulting hex string can be e.g. stored in a host application consensus state vector. If two members diverge in a way that causes their swingset state to differ, they will have different crankhashes, and the consensus state vectors will diverge. This should cause at least one of them to fall out of consensus. Some keys are excluded from consensus: currently just those involving vat snapshots and the truncation (non-initial starting point) of the transcript. refs #3442

This 'crankhash' covers all kernel state changes, so if any two validators diverge, their crankhashes should diverge too. By storing them into the cosmos-sdk state vector, the AppHashes will diverage as well, causing at least one of them to fall out of consensus. This should let us catch divergence as soon as possible. closes #3442

This 'activityhash' covers all kernel state changes, so if any two validators diverge, their activityhashes should diverge too. By storing them into the cosmos-sdk state vector, the AppHashes will diverage as well, causing at least one of them to fall out of consensus. This should let us catch divergence as soon as possible. closes #3442

The host application (which uses swingset as a library) is obligated to provide a "hostStorage" object, which is a trio of kvStore, streamStore, and snapStore. The host typically uses openLMDBSwingStore() to create this, and holds onto the "commit block" facet. Previously the kernelKeeper was built on top of the streamStore, the snapStore, and an "enhancedCrankBuffer", which was the host-provided key-value store plus an abortable/committable "crank buffer", plus some extra convenience methods. The kernel held the abort/commit facet. The kernel was responsible for extracting the raw kvStore from `hostStorage`, wrapping it, and passing the enhanced version into `makeKernelKeeper` along with the other two values. Instead, we now give the entire "hostStorage" object to `makeKernelKeeper`, and give the kernel keeper responsibility for doing the wrapping. When the kernel wants to commit or abort, it calls `kernelKeeper.abortCrank()` or `commitCrank()`. This simplifies the `makeKernelKeeper` construction and will provide a place for `commitCrank` to store a hash of kernel activity. refs #3442

We're defining the `local.*` namespace as the portion of the DB that is excluded from consensus (the #3442 kernel activity hash). Vat snapshots are a local concern: each member of a consensus machine can decide for itself if/when to make a snapshot. So both the vat->snapshot and the snapshot->vats tables must be moved to `local.`.

The kernelstats are merely incidentally deterministic: unlike c-lists and run-queues and object/promise state, stats aren't the primary thing we want to compare across replicas in a consensus machine. And we might want to add more in the future without causing a compatibility break. So this changes the DB key from `kernelStats` to `local.kernelStats`, which will exclude them from the #3442 kernel activity hash.

The multiple members of a consensus machine are supposed to perform identical exection of every crank. To detect any possible divergence as quickly as possible, the kernel maintains the "activityhash": a constantly-updated string which incorporates (by SHA256 hash) a copy of every DB write and delete. Each crank's DB writes/deletes are accumulated into the "crankhash". If the crank is committed, its crankhash is hashed into the old activityhash to form the new "activityhash". `controller.getActivityhash()` can be run after one or more cranks have finished, and the resulting hex string can be e.g. stored in a host application consensus state vector. If two members diverge in a way that causes their swingset state to differ, they will have different activityhashes, and the consensus state vectors will diverge. This should cause at least one of them to fall out of consensus. Some keys are excluded from consensus: currently just those involving vat snapshots and the truncation (non-initial starting point) of the transcript. refs #3442

This 'activityhash' covers all kernel state changes, so if any two validators diverge, their activityhashes should diverge too. By storing them into the cosmos-sdk state vector, the AppHashes will diverage as well, causing at least one of them to fall out of consensus. This should let us catch divergence as soon as possible. closes #3442

warner added enhancement New feature or request SwingSet package: SwingSet cosmic-swingset package: cosmic-swingset labels Jul 1, 2021

warner added this to the Testnet: Metering Phase milestone Jul 14, 2021

warner mentioned this issue Jul 19, 2021

fix(swingset): don't deduplicate inbound mailbox messages #3492

Merged

warner mentioned this issue Jul 22, 2021

fix(swingset): delete unused snapshots #3505

Merged

warner self-assigned this Jul 23, 2021

warner mentioned this issue Aug 9, 2021

expose hash of kernel state changes as "activityhash" #3625

Merged

warner mentioned this issue Aug 9, 2021

feat(cosmic-swingset): add swingset 'crankhash' to state vector #3626

Merged

warner closed this as completed in #3626 Aug 13, 2021

This was referenced Aug 18, 2021

test-controller activity-hash check is too sensitive #3718

Closed

consensus failure in testnet4.5 on RPC node peer-2 at block 9292 #3739

Closed

add "flight recorder", triggered by consensus failure #3742

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"kernel activity hash": add some amount of swingset state into the cosmos AppHash #3442

"kernel activity hash": add some amount of swingset state into the cosmos AppHash #3442

warner commented Jul 1, 2021

warner commented Jul 14, 2021

warner commented Jul 17, 2021

warner commented Jul 30, 2021

"kernel activity hash": add some amount of swingset state into the cosmos AppHash #3442

"kernel activity hash": add some amount of swingset state into the cosmos AppHash #3442

Comments

warner commented Jul 1, 2021

What is the Problem Being Solved?

Description of the Design

Consensus Considerations

Security Considerations

Test Plan

warner commented Jul 14, 2021

warner commented Jul 17, 2021

warner commented Jul 30, 2021