Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current loopbox device is incompatible with replay #3260

Closed
FUDCo opened this issue Jun 5, 2021 · 1 comment
Closed

Current loopbox device is incompatible with replay #3260

FUDCo opened this issue Jun 5, 2021 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@FUDCo
Copy link
Contributor

FUDCo commented Jun 5, 2021

Describe the bug

The loopbox device's makeSender method returns a device node, which in turn is serialized as a d+NN vref. However, the allocation counter in deviceSlots.js that generates this ID gets reset in each separate execution of the kernel (as when restarting with replay), and (b) replay does not rebind the vref and the corresponding kdNN kernel device reference on restart. This causes terrible things to happen if the loopbox device is used in streams of execution involving multiple executions of the kernel over time. Fortunately, this only affects tests, which can generally be run to completion without difficulty (except for tests which for reason of what is being tested want to execute in stages -- the case, in fact, that lead to this bug's discovery), and no other devices have methods that return newly generated device nodes.

The long term fix is to refactor the deviceSlots portion of the kernel to avoid the possibility of exporting Remotables, making it more primitive rather than a poor imitation of liveSlots, but we will create a separate issue for this. This will also require overhauling (and possibly phasing out) the loopbox device, but that is a matter for a separate issue of its own (or possible several) and rewriting the various devices for whatever its new device API ends up being.

In the meantime, @warner and I have worked out a horrible hack scheme for a relatively minimal alteration to the loopbox device that can address the problem. (Short summary: generate all the sender device nodes that will be needed in the buildRootDeviceNode function, serialize them to the state storage to force generation of vrefs, save them in a table, and look them up by name when needed.)

To Reproduce

The problem can be demonstrated using the swingset-runner encouragementBotComms demo running in separate 5 crank block executions:

From the swingset-runner directory:

bin/runner --init --loopbox --verbose --batchsize 5 --blocksize 5 --config demo/encouragementBotComms/swingset.json batch
bin/runner --loopbox --verbose --batchsize 5 --blocksize 5 --config demo/encouragementBotComms/swingset.json batch
bin/runner --loopbox --verbose --batchsize 5 --blocksize 5 --config demo/encouragementBotComms/swingset.json batch
bin/runner --loopbox --verbose --batchsize 5 --blocksize 5 --config demo/encouragementBotComms/swingset.json batch
bin/runner --loopbox --verbose --batchsize 5 --blocksize 5 --config demo/encouragementBotComms/swingset.json batch

This will yield a failure looking like:

##### KERNEL PANIC: error during syscall/device.invoke: TypeError: Cannot use 'in' operator to search for 'add' in undefined #####
removing static vat v7
vat terminated: {"body":"{\"@qclass\":\"error\",\"name\":\"Error\",\"message\":\"you killed my kernel. prepare to die\"}","slots":[]}
terminated vat v7
UnhandledPromiseRejectionWarning: (TypeError#1)
TypeError#1: Cannot use 'in' operator to search for 'add' in undefined
  at Object.invoke (kernel/.../packages/SwingSet/src/kernel/deviceSlots.js:192:18)
  at Object.invoke (kernel/.../packages/SwingSet/src/kernel/deviceManager.js:80:36)
  at invoke (kernel/.../packages/SwingSet/src/kernel/kernelSyscall.js:88:28)
  at Object.doKernelSyscall (kernel/.../packages/SwingSet/src/kernel/kernelSyscall.js:142:16)
  at vatSyscallHandler (kernel/.../packages/SwingSet/src/kernel/kernel.js:697:43)
  at syscallFromWorker (kernel/.../packages/SwingSet/src/kernel/vatManager/manager-helper.js:218:18)
  at doSyscall (kernel/.../packages/SwingSet/src/kernel/vatManager/supervisor-helper.js:126:11)
  at Object.callNow (kernel/.../packages/SwingSet/src/kernel/vatManager/supervisor-helper.js:171:5)
  at Proxy.eval (kernel/.../packages/SwingSet/src/kernel/liveSlots.js:669:31)
  at Alleged: transmitter.transmit (vat-v7/.../packages/SwingSet/src/vats/vat-tp.js:157:22)
  at /Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:412:23
  at Object.applyMethod (/Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:377:14)
  at doIt (/Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:419:67)
  at /Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/track-turns.js:65:22
  at win (/Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:432:19)
  at /Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:449:20

However, although the crash happens on the 24th crank, by comparing the output logs to those from a run that executes the entire demo in a single execution, the problem can be seen to actually manifest in the 8th crank.

Expected behavior

The entire demo should run to completion in 34 cranks and there should be no meaningful differences in the logs between running the whole thing in one go versus breaking it into 5 crank pieces.

@FUDCo FUDCo added the bug Something isn't working label Jun 5, 2021
@FUDCo FUDCo self-assigned this Jun 5, 2021
FUDCo added a commit that referenced this issue Jun 6, 2021
FUDCo added a commit that referenced this issue Jun 8, 2021
@FUDCo
Copy link
Contributor Author

FUDCo commented Jun 8, 2021

Closed by #3261

@FUDCo FUDCo closed this as completed Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant