Skip to content

Commit cb239cb

Browse files
dckcwarner
andauthored
feat: demand-paged vats are reloaded from heap snapshots (#2848)
This enhances SwingSet to have a "Vat Warehouse" which limits the number of "paged-in" vats to some maximum (currently 50). The idea is to conserve system RAM by allowing idle vats to remain "paged-out", which consumes only space on disk, until someone sends a message to them. The vat is then paged in, by creating a new xsnap process and reloading the necessary vat state. This reload process is greatly accelerated by loading a heap snapshot, if one is available. We only need to replay the suffix of the transcript that was recorded after the snapshot was taken, rather than the full (huge) transcript. Heap snapshots are stored in a new swingstore component named the "stream store". For each vat, the warehouse saves a heap snapshot after a configurable number of deliveries (default 200). In addition, it saves an initial snapshot after just a few deliveries (default 2), because all contracts vats start out with a large delivery that provides the contract bundle to evaluate. By taking a snapshot quickly, we can avoid the time needed to re-evaluate that large bundle on almost all process restarts. This algorithm is a best guess: we'll refine it as we gather more data about the tradeoff between work now (the time it takes to create and write a snapshot), the storage space consumed by those snapshots, and work later (replaying more transcript). We're estimating that a typical contract snapshot consumes about 300kB (compressed). closes #2273 closes #2277 refs #2422 refs #2138 (might close it) * refactor(replay): hoist handle declaration * chore(xsnap): clarify names of snapStore temp files for debugging * feat(swingset): initializeSwingset snapshots XS supervisor - solo: add xsnap, tmp dependencies - cosmic-swingset: declare dependencies on xsnap, tmp - snapshotSupervisor() - vk.saveSnapshot(), vk.getLastSnapshot() - test: mock vatKeeper needs getLastSnapshot() - test(snapstore): update snapshot hash - makeSnapstore in solo, cosmic-swingset - chore(solo): create xs-snapshots directory - more getVatKeeper -> provideVatKeeper - startPos arg for replayTransript() - typecheck shows vatAdminRootKref could be missing - test pre-SES snapshot size - hoist snapSize to test title - clarify SES vs. pre-SES XS workers - factor bootWorker out of bootSESWorker - hoist Kb, relativeSize for sharing between tests misc: - WIP: restore from snapshot - hard-code remote style fix(swingset): don't leak xs-worker in initializeSwingset When taking a snapshot of the supervisor in initializeSwingset, we neglected to `.close()` it. Lack of a name hindered diagnosis, so let's fix that while we're at it. * feat(swingset): save snapshot periodically after deliveries - vk.saveSnapShot() handles snapshotInterval - annotate type of kvStore in makeVatKeeper - move getLastSnapshot up for earlier use - refactor: rename snapshotDetail to lastSnapshot - factor out getTranscriptEnd - vatWarehouse.maybeSaveSnapshot() - saveSnapshot: - don't require snapStore - fix startPos type - provide snapstore to vatKeeper via kernelKeeper - buildKernel: get snapstore out of hostStorage - chore: don't try to snapshot a terminated vat * feat(swingset): load vats from snapshots - don't `setBundle` when loading from snapshot - provide startPos to replayTranscript() - test reloading a vat * refactor(vatWarehouse): factor out, test LRU logic * fix(vat-warehouse): remove vatID from LRU when evicting * chore(vatKeeper): prune debug logging in saveSnapshot (FIXUP) * feat(swingset): log bringing vats online (esp from snapshot) - manager.replayTranscript returns number of entries replayed * chore: resove "skip crank buffering?" issue after discussion with CM: maybeSaveSnapshot() happens before commitCrank() so nothing special needed here * chore: prune makeSnapshot arg from evict() Not only is this option not implemented now, but CM's analysis shows that adding it would likely be harmful. * test(swingset): teardown snap-store * chore(swingset): initial sketch of snapshot reload test * refactor: let itemCount be not-optional in StreamPosition * feat: snapshot early then infrequently - refactor: move snapshot decision up from vk.saveSnapshot() up to vw.maybeSaveSnapshot * test: provide getLastSnapshot to mock vatKeeper * chore: vattp: turn off managerType local work-around * chore: vat-warehouse: initial snapshot after 2 deliveries integration testing shows this is closer to ideal * chore: prune deterministic snapshot assertion oops. rebase problem. * chore: fix test-snapstore ld.asset rebase / merge problem?! * chore: never mind supervisorHash optimization With snapshotInitial at 2, there is little reason to snapshot after loading the supervisor bundles. The code doesn't carry its own weight. Plus, it seems to introduce a strange bug with marshal or something... ``` test/test-home.js:37 36: const { board } = E.get(home); 37: await t.throwsAsync( 38: () => E(board).getValue('148'), getting a value for a fake id throws Returned promise rejected with unexpected exception: Error { message: 'Remotable (a string) is already frozen', } ``` * docs(swingset): document lastSnapshot kernel DB key * refactor: capitalize makeSnapStore consistently * refactor: replayTranscript caller is responsible to getLastSnapshot * test(swingset): consistent vat-warehouse test naming * refactor(swingset): compute transcriptSnapshotStats in vatKeeper In an attempt to avoid reading the lastSnapshot DB key if the t.endPosition key was enough information to decide to take a snapshot, the vatWarehouse was peeking into the vatKeeper's business. Let's go with code clarity over (un-measured) performance. * chore: use harden, not freeze; clarify lru * chore: use distinct fixture directories to avoid collision The "temporary" snapstore directories used by two different tests began to overlap when the tests were moved into the same parent dir, and one test was deleting the directory while the other was still using it (as well as mingling files at runtime), causing an xsnap process to die with an IO error if the test were run in parallel. This changes the the two tests to use distinct directories. In the long run, we should either have them use `mktmp` to build a randomly-named known-unique directory, or establish a convention where tempdir names match the name of the test file and case using them, to avoid collisions as we add more tests. Co-authored-by: Brian Warner <warner@lothar.com>
1 parent f3e4f87 commit cb239cb

27 files changed

+636
-217
lines changed

packages/SwingSet/src/controller.js

+19-35
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,19 @@
11
/* global require */
22
// @ts-check
33
import fs from 'fs';
4-
import path from 'path';
54
import process from 'process';
65
import re2 from 're2';
76
import { performance } from 'perf_hooks';
87
import { spawn as ambientSpawn } from 'child_process';
98
import { type as osType } from 'os';
109
import { Worker } from 'worker_threads';
1110
import anylogger from 'anylogger';
12-
import { tmpName } from 'tmp';
1311

1412
import { assert, details as X } from '@agoric/assert';
1513
import { isTamed, tameMetering } from '@agoric/tame-metering';
1614
import { importBundle } from '@agoric/import-bundle';
1715
import { makeMeteringTransformer } from '@agoric/transform-metering';
18-
import { xsnap, makeSnapstore, recordXSnap } from '@agoric/xsnap';
16+
import { xsnap, recordXSnap } from '@agoric/xsnap';
1917

2018
import engineGC from './engine-gc.js';
2119
import { WeakRef, FinalizationRegistry } from './weakref.js';
@@ -49,12 +47,12 @@ function unhandledRejectionHandler(e) {
4947
/**
5048
* @param {{ moduleFormat: string, source: string }[]} bundles
5149
* @param {{
52-
* snapstorePath?: string,
50+
* snapStore?: SnapStore,
5351
* spawn: typeof import('child_process').spawn
5452
* env: Record<string, string | undefined>,
5553
* }} opts
5654
*/
57-
export function makeStartXSnap(bundles, { snapstorePath, env, spawn }) {
55+
export function makeStartXSnap(bundles, { snapStore, env, spawn }) {
5856
/** @type { import('@agoric/xsnap/src/xsnap').XSnapOptions } */
5957
const xsnapOpts = {
6058
os: osType(),
@@ -79,37 +77,27 @@ export function makeStartXSnap(bundles, { snapstorePath, env, spawn }) {
7977
};
8078
}
8179

82-
/** @type { ReturnType<typeof makeSnapstore> } */
83-
let snapStore;
84-
85-
if (snapstorePath) {
86-
fs.mkdirSync(snapstorePath, { recursive: true });
87-
88-
snapStore = makeSnapstore(snapstorePath, {
89-
tmpName,
90-
existsSync: fs.existsSync,
91-
createReadStream: fs.createReadStream,
92-
createWriteStream: fs.createWriteStream,
93-
rename: fs.promises.rename,
94-
unlink: fs.promises.unlink,
95-
resolve: path.resolve,
96-
});
97-
}
98-
99-
let supervisorHash = '';
10080
/**
10181
* @param {string} name
10282
* @param {(request: Uint8Array) => Promise<Uint8Array>} handleCommand
10383
* @param { boolean } [metered]
84+
* @param { string } [snapshotHash]
10485
*/
105-
async function startXSnap(name, handleCommand, metered) {
106-
if (supervisorHash) {
107-
return snapStore.load(supervisorHash, async snapshot => {
86+
async function startXSnap(
87+
name,
88+
handleCommand,
89+
metered,
90+
snapshotHash = undefined,
91+
) {
92+
if (snapStore && snapshotHash) {
93+
// console.log('startXSnap from', { snapshotHash });
94+
return snapStore.load(snapshotHash, async snapshot => {
10895
const xs = doXSnap({ snapshot, name, handleCommand, ...xsnapOpts });
10996
await xs.evaluate('null'); // ensure that spawn is done
11097
return xs;
11198
});
11299
}
100+
// console.log('fresh xsnap', { snapStore: snapStore });
113101
const meterOpts = metered ? {} : { meteringLimit: 0 };
114102
const worker = doXSnap({ handleCommand, name, ...meterOpts, ...xsnapOpts });
115103

@@ -121,9 +109,6 @@ export function makeStartXSnap(bundles, { snapstorePath, env, spawn }) {
121109
// eslint-disable-next-line no-await-in-loop
122110
await worker.evaluate(`(${bundle.source}\n)()`.trim());
123111
}
124-
if (snapStore) {
125-
supervisorHash = await snapStore.save(async fn => worker.snapshot(fn));
126-
}
127112
return worker;
128113
}
129114
return startXSnap;
@@ -140,7 +125,6 @@ export function makeStartXSnap(bundles, { snapstorePath, env, spawn }) {
140125
* slogFile?: string,
141126
* testTrackDecref?: unknown,
142127
* warehousePolicy?: { maxVatsOnline?: number },
143-
* snapstorePath?: string,
144128
* spawn?: typeof import('child_process').spawn,
145129
* env?: Record<string, string | undefined>
146130
* }} runtimeOptions
@@ -162,7 +146,6 @@ export async function makeSwingsetController(
162146
debugPrefix = '',
163147
slogCallbacks,
164148
slogFile,
165-
snapstorePath,
166149
spawn = ambientSpawn,
167150
warehousePolicy = {},
168151
} = runtimeOptions;
@@ -300,7 +283,11 @@ export async function makeSwingsetController(
300283
// @ts-ignore assume supervisorBundle is set
301284
JSON.parse(kvStore.get('supervisorBundle')),
302285
];
303-
const startXSnap = makeStartXSnap(bundles, { snapstorePath, env, spawn });
286+
const startXSnap = makeStartXSnap(bundles, {
287+
snapStore: hostStorage.snapStore,
288+
env,
289+
spawn,
290+
});
304291

305292
const kernelEndowments = {
306293
waitUntilQuiescent,
@@ -430,7 +417,6 @@ export async function makeSwingsetController(
430417
* debugPrefix?: string,
431418
* slogCallbacks?: unknown,
432419
* testTrackDecref?: unknown,
433-
* snapstorePath?: string,
434420
* warehousePolicy?: { maxVatsOnline?: number },
435421
* slogFile?: string,
436422
* }} runtimeOptions
@@ -447,15 +433,13 @@ export async function buildVatController(
447433
kernelBundles,
448434
debugPrefix,
449435
slogCallbacks,
450-
snapstorePath,
451436
warehousePolicy,
452437
slogFile,
453438
} = runtimeOptions;
454439
const actualRuntimeOptions = {
455440
verbose,
456441
debugPrefix,
457442
slogCallbacks,
458-
snapstorePath,
459443
warehousePolicy,
460444
slogFile,
461445
};

packages/SwingSet/src/initializeSwingset.js

-5
Original file line numberDiff line numberDiff line change
@@ -310,11 +310,6 @@ export async function initializeSwingset(
310310
// it to comms
311311
config.vats.vattp = {
312312
bundle: kernelBundles.vattp,
313-
creationOptions: {
314-
// we saw evidence of vattp dropping messages, and out of caution,
315-
// we're keeping it on an in-kernel worker for now. See #3039.
316-
managerType: 'local',
317-
},
318313
};
319314

320315
// timer wrapper vat is added automatically, but TODO: bootstraps must

packages/SwingSet/src/kernel/kernel.js

+8-1
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,11 @@ export default function buildKernel(
125125
} = kernelOptions;
126126
const logStartup = verbose ? console.debug : () => 0;
127127

128-
const { kvStore, streamStore } = /** @type { HostStore } */ (hostStorage);
128+
const {
129+
kvStore,
130+
streamStore,
131+
snapStore,
132+
} = /** @type { HostStore } */ (hostStorage);
129133
insistStorageAPI(kvStore);
130134
const { enhancedCrankBuffer, abortCrank, commitCrank } = wrapStorage(kvStore);
131135
const vatAdminRootKref = kvStore.get('vatAdminRootKref');
@@ -138,6 +142,7 @@ export default function buildKernel(
138142
enhancedCrankBuffer,
139143
streamStore,
140144
kernelSlog,
145+
snapStore,
141146
);
142147

143148
const meterManager = makeMeterManager(replaceGlobalMeter);
@@ -673,6 +678,8 @@ export default function buildKernel(
673678
if (!didAbort) {
674679
kernelKeeper.processRefcounts();
675680
kernelKeeper.saveStats();
681+
// eslint-disable-next-line no-use-before-define
682+
await vatWarehouse.maybeSaveSnapshot();
676683
}
677684
commitCrank();
678685
kernelKeeper.incrementCrankNumber();

packages/SwingSet/src/kernel/state/kernelKeeper.js

+9-1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ const enableKernelGC = true;
5252
// v$NN.nextDeliveryNum = $NN
5353
// v$NN.t.endPosition = $NN
5454
// v$NN.vs.$key = string
55+
// v$NN.lastSnapshot = JSON({ snapshotID, startPos })
5556

5657
// d$NN.o.nextID = $NN
5758
// d$NN.c.$kernelSlot = $deviceSlot = o-$NN/d+$NN/d-$NN
@@ -109,8 +110,14 @@ const FIRST_CRANK_NUMBER = 0n;
109110
* @param {KVStorePlus} kvStore
110111
* @param {StreamStore} streamStore
111112
* @param {KernelSlog} kernelSlog
113+
* @param {SnapStore=} snapStore
112114
*/
113-
export default function makeKernelKeeper(kvStore, streamStore, kernelSlog) {
115+
export default function makeKernelKeeper(
116+
kvStore,
117+
streamStore,
118+
kernelSlog,
119+
snapStore = undefined,
120+
) {
114121
insistEnhancedStorageAPI(kvStore);
115122

116123
/**
@@ -939,6 +946,7 @@ export default function makeKernelKeeper(kvStore, streamStore, kernelSlog) {
939946
incStat,
940947
decStat,
941948
getCrankNumber,
949+
snapStore,
942950
);
943951
ephemeral.vatKeepers.set(vatID, vk);
944952
return vk;

packages/SwingSet/src/kernel/state/vatKeeper.js

+57-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ export function initializeVatState(kvStore, streamStore, vatID) {
4242
/**
4343
* Produce a vat keeper for a vat.
4444
*
45-
* @param {*} kvStore The keyValue store in which the persistent state will be kept
45+
* @param {KVStorePlus} kvStore The keyValue store in which the persistent state will be kept
4646
* @param {StreamStore} streamStore Accompanying stream store, for the transcripts
4747
* @param {*} kernelSlog
4848
* @param {string} vatID The vat ID string of the vat in question
@@ -60,6 +60,7 @@ export function initializeVatState(kvStore, streamStore, vatID) {
6060
* @param {*} incStat
6161
* @param {*} decStat
6262
* @param {*} getCrankNumber
63+
* @param { SnapStore= } snapStore
6364
* returns an object to hold and access the kernel's state for the given vat
6465
*/
6566
export function makeVatKeeper(
@@ -79,6 +80,7 @@ export function makeVatKeeper(
7980
incStat,
8081
decStat,
8182
getCrankNumber,
83+
snapStore = undefined,
8284
) {
8385
insistVatID(vatID);
8486
const transcriptStream = `transcript-${vatID}`;
@@ -417,6 +419,57 @@ export function makeVatKeeper(
417419
kvStore.set(`${vatID}.t.endPosition`, `${JSON.stringify(newPos)}`);
418420
}
419421

422+
/** @returns { StreamPosition } */
423+
function getTranscriptEndPosition() {
424+
return JSON.parse(
425+
kvStore.get(`${vatID}.t.endPosition`) ||
426+
assert.fail('missing endPosition'),
427+
);
428+
}
429+
430+
/**
431+
* @returns {{ snapshotID: string, startPos: StreamPosition } | undefined}
432+
*/
433+
function getLastSnapshot() {
434+
const notation = kvStore.get(`${vatID}.lastSnapshot`);
435+
if (!notation) {
436+
return undefined;
437+
}
438+
const { snapshotID, startPos } = JSON.parse(notation);
439+
assert.typeof(snapshotID, 'string');
440+
assert(startPos);
441+
return { snapshotID, startPos };
442+
}
443+
444+
function transcriptSnapshotStats() {
445+
const totalEntries = getTranscriptEndPosition().itemCount;
446+
const lastSnapshot = getLastSnapshot();
447+
const snapshottedEntries = lastSnapshot
448+
? lastSnapshot.startPos.itemCount
449+
: 0;
450+
return { totalEntries, snapshottedEntries };
451+
}
452+
453+
/**
454+
* Store a snapshot, if given a snapStore.
455+
*
456+
* @param { VatManager } manager
457+
* @returns { Promise<boolean> }
458+
*/
459+
async function saveSnapshot(manager) {
460+
if (!snapStore || !manager.makeSnapshot) {
461+
return false;
462+
}
463+
464+
const snapshotID = await manager.makeSnapshot(snapStore);
465+
const endPosition = getTranscriptEndPosition();
466+
kvStore.set(
467+
`${vatID}.lastSnapshot`,
468+
JSON.stringify({ snapshotID, startPos: endPosition }),
469+
);
470+
return true;
471+
}
472+
420473
function vatStats() {
421474
function getCount(key, first) {
422475
const id = Nat(BigInt(kvStore.get(key)));
@@ -477,8 +530,11 @@ export function makeVatKeeper(
477530
deleteCListEntry,
478531
deleteCListEntriesForKernelSlots,
479532
getTranscript,
533+
transcriptSnapshotStats,
480534
addToTranscript,
481535
vatStats,
482536
dumpState,
537+
saveSnapshot,
538+
getLastSnapshot,
483539
});
484540
}

packages/SwingSet/src/kernel/vatManager/manager-helper.js

+22-5
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,8 @@ import { makeTranscriptManager } from './transcript.js';
4747

4848
/**
4949
*
50-
* @typedef { { getManager: (shutdown: () => Promise<void>) => VatManager,
50+
* @typedef { { getManager: (shutdown: () => Promise<void>,
51+
* makeSnapshot?: (ss: SnapStore) => Promise<string>) => VatManager,
5152
* syscallFromWorker: (vso: VatSyscallObject) => VatSyscallResult,
5253
* setDeliverToWorker: (dtw: unknown) => void,
5354
* } } ManagerKit
@@ -178,12 +179,18 @@ function makeManagerKit(
178179
kernelSlog.write({ type: 'finish-replay-delivery', vatID, deliveryNum });
179180
}
180181

181-
async function replayTranscript() {
182+
/**
183+
* @param {StreamPosition | undefined} startPos
184+
* @returns { Promise<number?> } number of deliveries, or null if !useTranscript
185+
*/
186+
async function replayTranscript(startPos) {
187+
// console.log('replay from', { vatID, startPos });
188+
182189
if (transcriptManager) {
183190
const total = vatKeeper.vatStats().transcriptCount;
184191
kernelSlog.write({ type: 'start-replay', vatID, deliveries: total });
185192
let deliveryNum = 0;
186-
for (const t of vatKeeper.getTranscript()) {
193+
for (const t of vatKeeper.getTranscript(startPos)) {
187194
// if (deliveryNum % 100 === 0) {
188195
// console.debug(`replay vatID:${vatID} deliveryNum:${deliveryNum} / ${total}`);
189196
// }
@@ -194,7 +201,10 @@ function makeManagerKit(
194201
}
195202
transcriptManager.checkReplayError();
196203
kernelSlog.write({ type: 'finish-replay', vatID });
204+
return deliveryNum;
197205
}
206+
207+
return null;
198208
}
199209

200210
/**
@@ -235,10 +245,17 @@ function makeManagerKit(
235245
/**
236246
*
237247
* @param { () => Promise<void>} shutdown
248+
* @param { (ss: SnapStore) => Promise<string> } makeSnapshot
238249
* @returns { VatManager }
239250
*/
240-
function getManager(shutdown) {
241-
return harden({ replayTranscript, replayOneDelivery, deliver, shutdown });
251+
function getManager(shutdown, makeSnapshot) {
252+
return harden({
253+
replayTranscript,
254+
replayOneDelivery,
255+
deliver,
256+
shutdown,
257+
makeSnapshot,
258+
});
242259
}
243260

244261
return harden({ getManager, syscallFromWorker, setDeliverToWorker });

0 commit comments

Comments
 (0)