Current loopbox device is incompatible with replay #3260

FUDCo · 2021-06-05T07:31:33Z

Describe the bug

The loopbox device's makeSender method returns a device node, which in turn is serialized as a d+NN vref. However, the allocation counter in deviceSlots.js that generates this ID gets reset in each separate execution of the kernel (as when restarting with replay), and (b) replay does not rebind the vref and the corresponding kdNN kernel device reference on restart. This causes terrible things to happen if the loopbox device is used in streams of execution involving multiple executions of the kernel over time. Fortunately, this only affects tests, which can generally be run to completion without difficulty (except for tests which for reason of what is being tested want to execute in stages -- the case, in fact, that lead to this bug's discovery), and no other devices have methods that return newly generated device nodes.

The long term fix is to refactor the deviceSlots portion of the kernel to avoid the possibility of exporting Remotables, making it more primitive rather than a poor imitation of liveSlots, but we will create a separate issue for this. This will also require overhauling (and possibly phasing out) the loopbox device, but that is a matter for a separate issue of its own (or possible several) and rewriting the various devices for whatever its new device API ends up being.

In the meantime, @warner and I have worked out a ~~horrible hack~~ scheme for a relatively minimal alteration to the loopbox device that can address the problem. (Short summary: generate all the sender device nodes that will be needed in the buildRootDeviceNode function, serialize them to the state storage to force generation of vrefs, save them in a table, and look them up by name when needed.)

To Reproduce

The problem can be demonstrated using the swingset-runner encouragementBotComms demo running in separate 5 crank block executions:

From the swingset-runner directory:

bin/runner --init --loopbox --verbose --batchsize 5 --blocksize 5 --config demo/encouragementBotComms/swingset.json batch
bin/runner --loopbox --verbose --batchsize 5 --blocksize 5 --config demo/encouragementBotComms/swingset.json batch
bin/runner --loopbox --verbose --batchsize 5 --blocksize 5 --config demo/encouragementBotComms/swingset.json batch
bin/runner --loopbox --verbose --batchsize 5 --blocksize 5 --config demo/encouragementBotComms/swingset.json batch
bin/runner --loopbox --verbose --batchsize 5 --blocksize 5 --config demo/encouragementBotComms/swingset.json batch

This will yield a failure looking like:

##### KERNEL PANIC: error during syscall/device.invoke: TypeError: Cannot use 'in' operator to search for 'add' in undefined #####
removing static vat v7
vat terminated: {"body":"{\"@qclass\":\"error\",\"name\":\"Error\",\"message\":\"you killed my kernel. prepare to die\"}","slots":[]}
terminated vat v7
UnhandledPromiseRejectionWarning: (TypeError#1)
TypeError#1: Cannot use 'in' operator to search for 'add' in undefined
  at Object.invoke (kernel/.../packages/SwingSet/src/kernel/deviceSlots.js:192:18)
  at Object.invoke (kernel/.../packages/SwingSet/src/kernel/deviceManager.js:80:36)
  at invoke (kernel/.../packages/SwingSet/src/kernel/kernelSyscall.js:88:28)
  at Object.doKernelSyscall (kernel/.../packages/SwingSet/src/kernel/kernelSyscall.js:142:16)
  at vatSyscallHandler (kernel/.../packages/SwingSet/src/kernel/kernel.js:697:43)
  at syscallFromWorker (kernel/.../packages/SwingSet/src/kernel/vatManager/manager-helper.js:218:18)
  at doSyscall (kernel/.../packages/SwingSet/src/kernel/vatManager/supervisor-helper.js:126:11)
  at Object.callNow (kernel/.../packages/SwingSet/src/kernel/vatManager/supervisor-helper.js:171:5)
  at Proxy.eval (kernel/.../packages/SwingSet/src/kernel/liveSlots.js:669:31)
  at Alleged: transmitter.transmit (vat-v7/.../packages/SwingSet/src/vats/vat-tp.js:157:22)
  at /Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:412:23
  at Object.applyMethod (/Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:377:14)
  at doIt (/Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:419:67)
  at /Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/track-turns.js:65:22
  at win (/Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:432:19)
  at /Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:449:20

However, although the crash happens on the 24th crank, by comparing the output logs to those from a run that executes the entire demo in a single execution, the problem can be seen to actually manifest in the 8th crank.

Expected behavior

The entire demo should run to completion in 34 cranks and there should be no meaningful differences in the logs between running the whole thing in one go versus breaking it into 5 crank pieces.

The text was updated successfully, but these errors were encountered:

Fixes bug #3260

FUDCo · 2021-06-08T21:34:19Z

Closed by #3261

FUDCo added the bug Something isn't working label Jun 5, 2021

FUDCo self-assigned this Jun 5, 2021

FUDCo mentioned this issue Jun 5, 2021

Support vats without transcripts #3257

Merged

FUDCo added a commit that referenced this issue Jun 6, 2021

fix: make loopbox device compatible with replay

d712c74

Fixes bug #3260

FUDCo mentioned this issue Jun 6, 2021

Make loopbox device compatible with replay #3261

Merged

FUDCo added a commit that referenced this issue Jun 8, 2021

fix: make loopbox device compatible with replay

ce11fff

Fixes bug #3260

FUDCo closed this as completed Jun 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current loopbox device is incompatible with replay #3260

Current loopbox device is incompatible with replay #3260

FUDCo commented Jun 5, 2021

FUDCo commented Jun 8, 2021

Current loopbox device is incompatible with replay #3260

Current loopbox device is incompatible with replay #3260

Comments

FUDCo commented Jun 5, 2021

Describe the bug

To Reproduce

Expected behavior

FUDCo commented Jun 8, 2021