-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor metadata storage in AOT artifacts #5153
Merged
alexcrichton
merged 2 commits into
bytecodealliance:main
from
alexcrichton:refactor-compilation-metadata
Oct 29, 2022
Merged
Refactor metadata storage in AOT artifacts #5153
alexcrichton
merged 2 commits into
bytecodealliance:main
from
alexcrichton:refactor-compilation-metadata
Oct 29, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit is a reorganization of how metadata is stored in Wasmtime's compiled artifacts. Currently Wasmtime's ELF artifacts have data appended after them to contain metadata about the `Engine` as well as type information for the module itself. This extra data at the end of the file is ignored by ELF-related utilities generally and is assembled during the module serialization process. In working on AOT-compiling components, though, I've discovered a number of issues with this: * Primarily it's possible to mistakenly change an artifact if it's deserialized and then serialized again. This issue is probably theoretical but the deserialized artifact records the `Engine` configuration at time of creation but when re-serializing that it serializes the current `Engine` state, not the original `Engine` state. * Additionally the serialization strategy here is tightly coupled to `Module` and its serialization format. While this makes sense it is not conducive for future refactorings to use a similar serialization format for components. The engine metadata, for example, does not necessarily need to be tied up with type information. * The storage for this extra metadata is a bit wonky by shoving it at the end of the ELF file. The original reason for this was to have a compiled artifact be multiple objects concatenated with each other to support serializing module-linking-using modules. Module linking is no longer a thing and I have since decided that for the component model all compilation artifacts will go into one object file to assist debugability. This means that the extra stick-it-at-the-end is no longer necessary. To solve these issues this commit splits up the `module/serialization.rs` file in two, mostly moving the logic to `engine/serialization.rs`. The engine serialization logic now handles everything related to `Engine` compatibility such as targets, compiler flags, wasm features, etc. The module serialization logic is now exclusively interested in type information. The engine metadata and serialized type information additionally live in sections of the final file now instead of at the end. This means that there are three primary `bincode`-encoded sections that are parsed on deserializing a file: 1. The `Engine`-specific metadata. This will be the same for both modules and components. 2. The `CompiledModuleInfo` structure. For core wasm there's just one of these but for the component model there will be multiple, one per core wasm module. 3. The type information. For core wasm this is a `ModuleTypes` but for a component this will be a `ComponentTypes`. No true functional change is expected from this commit. Binary artifacts might get inflated by a small handful of bytes due to using ELF sections to represent this now. A related change I made during this commit as well was the plumbing of the `is_branch_protection_enabled` flag. This is technically `Engine`-level metadata but I didn't want to plumb it all over the place as was done now, so instead a new section was added to the final binary just for this bti information. This means that it no longer needs to be a parameter to `CodeMemory::publish` and additionally is more amenable to a `Component`-is-just-one-object world where no single module owns this piece of metadata.
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
pchickey
approved these changes
Oct 28, 2022
Looks like there is just one minor problem with cranelift-free mode. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit is a reorganization of how metadata is stored in Wasmtime's compiled artifacts. Currently Wasmtime's ELF artifacts have data appended after them to contain metadata about the
Engine
as well as type information for the module itself. This extra data at the end of the file is ignored by ELF-related utilities generally and is assembled during the module serialization process.In working on AOT-compiling components, though, I've discovered a number of issues with this:
Primarily it's possible to mistakenly change an artifact if it's deserialized and then serialized again. This issue is probably theoretical but the deserialized artifact records the
Engine
configuration at time of creation but when re-serializing that it serializes the currentEngine
state, not the originalEngine
state.Additionally the serialization strategy here is tightly coupled to
Module
and its serialization format. While this makes sense it is not conducive for future refactorings to use a similar serialization format for components. The engine metadata, for example, does not necessarily need to be tied up with type information.The storage for this extra metadata is a bit wonky by shoving it at the end of the ELF file. The original reason for this was to have a compiled artifact be multiple objects concatenated with each other to support serializing module-linking-using modules. Module linking is no longer a thing and I have since decided that for the component model all compilation artifacts will go into one object file to assist debugability. This means that the extra stick-it-at-the-end is no longer necessary.
To solve these issues this commit splits up the
module/serialization.rs
file in two, mostly moving the logic toengine/serialization.rs
. The engine serialization logic now handles everything related toEngine
compatibility such as targets, compiler flags, wasm features, etc. The module serialization logic is now exclusively interested in type information.The engine metadata and serialized type information additionally live in sections of the final file now instead of at the end. This means that there are three primary
bincode
-encoded sections that are parsed on deserializing a file:Engine
-specific metadata. This will be the same for both modules and components.CompiledModuleInfo
structure. For core wasm there's just one of these but for the component model there will be multiple, one per core wasm module.ModuleTypes
but for a component this will be aComponentTypes
.No true functional change is expected from this commit. Binary artifacts might get inflated by a small handful of bytes due to using ELF sections to represent this now.
A related change I made during this commit as well was the plumbing of the
is_branch_protection_enabled
flag. This is technicallyEngine
-level metadata but I didn't want to plumb it all over the place as was done now, so instead a new section was added to the final binary just for this bti information. This means that it no longer needs to be a parameter toCodeMemory::publish
and additionally is more amenable to aComponent
-is-just-one-object world where no single module owns this piece of metadata.