You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The loopbox device's makeSender method returns a device node, which in turn is serialized as a d+NN vref. However, the allocation counter in deviceSlots.js that generates this ID gets reset in each separate execution of the kernel (as when restarting with replay), and (b) replay does not rebind the vref and the corresponding kdNN kernel device reference on restart. This causes terrible things to happen if the loopbox device is used in streams of execution involving multiple executions of the kernel over time. Fortunately, this only affects tests, which can generally be run to completion without difficulty (except for tests which for reason of what is being tested want to execute in stages -- the case, in fact, that lead to this bug's discovery), and no other devices have methods that return newly generated device nodes.
The long term fix is to refactor the deviceSlots portion of the kernel to avoid the possibility of exporting Remotables, making it more primitive rather than a poor imitation of liveSlots, but we will create a separate issue for this. This will also require overhauling (and possibly phasing out) the loopbox device, but that is a matter for a separate issue of its own (or possible several) and rewriting the various devices for whatever its new device API ends up being.
In the meantime, @warner and I have worked out a horrible hack scheme for a relatively minimal alteration to the loopbox device that can address the problem. (Short summary: generate all the sender device nodes that will be needed in the buildRootDeviceNode function, serialize them to the state storage to force generation of vrefs, save them in a table, and look them up by name when needed.)
To Reproduce
The problem can be demonstrated using the swingset-runner encouragementBotComms demo running in separate 5 crank block executions:
##### KERNEL PANIC: error during syscall/device.invoke: TypeError: Cannot use 'in' operator to search for 'add' in undefined #####
removing static vat v7
vat terminated: {"body":"{\"@qclass\":\"error\",\"name\":\"Error\",\"message\":\"you killed my kernel. prepare to die\"}","slots":[]}
terminated vat v7
UnhandledPromiseRejectionWarning: (TypeError#1)
TypeError#1: Cannot use 'in' operator to search for 'add' in undefined
at Object.invoke (kernel/.../packages/SwingSet/src/kernel/deviceSlots.js:192:18)
at Object.invoke (kernel/.../packages/SwingSet/src/kernel/deviceManager.js:80:36)
at invoke (kernel/.../packages/SwingSet/src/kernel/kernelSyscall.js:88:28)
at Object.doKernelSyscall (kernel/.../packages/SwingSet/src/kernel/kernelSyscall.js:142:16)
at vatSyscallHandler (kernel/.../packages/SwingSet/src/kernel/kernel.js:697:43)
at syscallFromWorker (kernel/.../packages/SwingSet/src/kernel/vatManager/manager-helper.js:218:18)
at doSyscall (kernel/.../packages/SwingSet/src/kernel/vatManager/supervisor-helper.js:126:11)
at Object.callNow (kernel/.../packages/SwingSet/src/kernel/vatManager/supervisor-helper.js:171:5)
at Proxy.eval (kernel/.../packages/SwingSet/src/kernel/liveSlots.js:669:31)
at Alleged: transmitter.transmit (vat-v7/.../packages/SwingSet/src/vats/vat-tp.js:157:22)
at /Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:412:23
at Object.applyMethod (/Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:377:14)
at doIt (/Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:419:67)
at /Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/track-turns.js:65:22
at win (/Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:432:19)
at /Users/chip/Agoric/agoric-sdk/packages/eventual-send/src/index.js:449:20
However, although the crash happens on the 24th crank, by comparing the output logs to those from a run that executes the entire demo in a single execution, the problem can be seen to actually manifest in the 8th crank.
Expected behavior
The entire demo should run to completion in 34 cranks and there should be no meaningful differences in the logs between running the whole thing in one go versus breaking it into 5 crank pieces.
The text was updated successfully, but these errors were encountered:
Describe the bug
The loopbox device's
makeSender
method returns a device node, which in turn is serialized as ad+NN
vref. However, the allocation counter indeviceSlots.js
that generates this ID gets reset in each separate execution of the kernel (as when restarting with replay), and (b) replay does not rebind the vref and the correspondingkdNN
kernel device reference on restart. This causes terrible things to happen if the loopbox device is used in streams of execution involving multiple executions of the kernel over time. Fortunately, this only affects tests, which can generally be run to completion without difficulty (except for tests which for reason of what is being tested want to execute in stages -- the case, in fact, that lead to this bug's discovery), and no other devices have methods that return newly generated device nodes.The long term fix is to refactor the
deviceSlots
portion of the kernel to avoid the possibility of exporting Remotables, making it more primitive rather than a poor imitation ofliveSlots
, but we will create a separate issue for this. This will also require overhauling (and possibly phasing out) the loopbox device, but that is a matter for a separate issue of its own (or possible several) and rewriting the various devices for whatever its new device API ends up being.In the meantime, @warner and I have worked out a
horrible hackscheme for a relatively minimal alteration to the loopbox device that can address the problem. (Short summary: generate all the sender device nodes that will be needed in thebuildRootDeviceNode
function, serialize them to the state storage to force generation of vrefs, save them in a table, and look them up by name when needed.)To Reproduce
The problem can be demonstrated using the swingset-runner
encouragementBotComms
demo running in separate 5 crank block executions:From the swingset-runner directory:
This will yield a failure looking like:
However, although the crash happens on the 24th crank, by comparing the output logs to those from a run that executes the entire demo in a single execution, the problem can be seen to actually manifest in the 8th crank.
Expected behavior
The entire demo should run to completion in 34 cranks and there should be no meaningful differences in the logs between running the whole thing in one go versus breaking it into 5 crank pieces.
The text was updated successfully, but these errors were encountered: