-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement loading modules from piped files #5035
Conversation
It seems like I cannot assign a reviewer for the PR(the template indicates me to do so), so not assigning. |
src/commands/run.rs
Outdated
f.read_to_end(&mut buf)?; | ||
let mmap_vec = MmapVec::from_slice(&buf)?; | ||
Ok::<_, anyhow::Error>(mmap_vec) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces new copies of the data, slowing down startup. Previously Module::from_file
would mmap the file once and then use this mmap'ed version directly. Now it will mmap it once (or read to a vec and then copy it to a new memory region in MmapVec::from_slice
) and then copy it to a new memory region again in Module::new
(through MmapVec::from_slice
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely right. To be honest, I thought directly writing Mmap/MmapVec with looped f.read()
is more efficient but feeled like a premature optimization, so chose this implementation for brevity. (The main reason for this is the File::read_to_end is not compatible with MmapVec.)
If startup performance matters for piped inputs too, I'll fix this into like
loop {
let buf = [0u8; 4<<10];
match f.read(&mut buf) {
Ok(0) => break,
Ok(n) => /* copy `n` bytes to Mmap and increment the offset */ todo!(),
Err(e) => return Err(e),
}
}
But again I'm not sure this is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will still have an extra copy in case the mmap succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I didn't understand what you were saying. 😅 Now I see.
I'm afraid that this is not able to be trivially fixed, as there's no public API on Module
to pass MmapVec directly into a constructor?
Module::from_binary
seems to be the 'generic' catch-all for many Module constructors but I don't have an idea what it does...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, you said about the case when the mmap suceeded, so trying to mmap and if it succeeds then discarding the mmap and using Module::from_path is OK. Is that you're thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may work.
Piping tests now run on unix platforms only(as the |
Cargo.toml
Outdated
@@ -30,6 +30,7 @@ wasmtime-wast = { workspace = true } | |||
wasmtime-wasi = { workspace = true } | |||
wasmtime-wasi-crypto = { workspace = true, optional = true } | |||
wasmtime-wasi-nn = { workspace = true, optional = true } | |||
wasmtime-runtime = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Wasmtime CLI serves as an example of how to use Wasmtime's API, and in general we don't want users of the API depending on internal crates like wasmtime-runtime. Would it be possible to factor out this code for opening and loading a module either from deserialized form or a plain module into a utility function in the wasmtime crate, so that the CLI code can call that rather than doing this work itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was only needed for MmapVec so I'll remove on the next commit. (see below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After re-reading my previous commits and review comments here, I realized this reply is not possible at all. MmapVec cannot be avoided because load_module
should do both "header check" and "copyless mmap" in one path. I think your suggestion is truly valid but please kindly understand that I don't know much about this project (yet) to implement it rapidly so I would come back with results after some time with some investigation and tryouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. And thanks for working on this! I agree that running modules from stdin is an interesting idea.
src/commands/run.rs
Outdated
// to store the whole bytes in a memory. | ||
let mmap_vec = MmapVec::from_file(path).or_else(|_| { | ||
// Mmap-ing this much would not eat system memory thanks for paging | ||
const MAX_READ_FROM_STREAM: usize = 1 << 30; // TODO: make this configurable with environment variables? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arbitrary limitations and environment variables are something we'd like to avoid if possible. The users of mmap_vec
don't care if they have an MMapVec
or a plain Vec<u8>
, so in the case where the mmap fails, would it work to just call std::fs::read
to read the file? That would require reorganizing this code to avoid using MmapVec
as the type to unify the two options. Something like this:
let mmap_vec;
let read;
let bytes: &[u8] = if let Ok(mapped) = MmapVec::from_file(Path::new(path)) {
mmap_vec = mapped;
mmap_vec.as_ref()
} else {
read = std::fs::read(path)?;
read.as_ref()
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I misinterpreted @bjorn3 's first review so I'll roll back into previous implementation which just uses Read::read_to_end
and Vec<u8>
, and prevent redundant allocation when a module is from a regular file. This may be sub-optimal for irregular file case but I think it's OK because no one wants to do hyperfine 'cat foo.wasm | wasmtime -'
?
I simplified the strategy to detect piped files(directly invokes |
src/commands/run.rs
Outdated
if cfg!(unix) { | ||
#[cfg(unix)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if cfg!(unix) { | |
#[cfg(unix)] | |
#[cfg(unix)] | |
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a genius! 🤣
Here's the strategy I'd suggest: Add a new method to In the CLI, if In your motivating use case ( This new method should read the file using Once you have an I think the only downside to this plan is that the error message from the CLI isn't as friendly if someone tries to run a precompiled artifact without passing One advantage to this approach is that |
@jameysharp Thanks for your detailed guidance! I didn't think fixing this would not be hard this way and admittedly I couldn't do some sort of refactoring the project(and even adding a new public API for it) never saw before, tempting myself to bring somewhat hackish way. I'll try to reimplement this way.
This looks good.
I'm not sure that I can bring an excellent solution for this; just will do my best. To recap I leave a tabular summary for your suggestion:
|
Right, the key here is that a file that begins with the 4-byte ELF magic is not valid for .wat or .wasm format and will be rejected by those parsers. None of the characters a .wat file can start with (which I think are whitespace, So The existing check that the CLI does for the ELF magic serves two purposes: one is to provide a helpful error message, and the other is to decide whether to use
Absolutely! I had to stare at the existing code for a long time, and think about a bunch of options that don't work, to come up with this suggestion. So of course I don't expect somebody new to the project to think of it. I think the project will be better off with this additional API. In my opinion, it would be an improvement even if it didn't help with reading from pipes: the CLI shouldn't need to know about ELF magic. But I also think having support for reading from pipes is a great idea, so I like that this fixes multiple issues at the same time. |
5742b5a
to
f730953
Compare
Rebased and implemented the @jameysharp 's guidance. (I'll move to a new pull request as the topic has changed, so please leave this PR not reviewed) |
f730953
to
7e054da
Compare
This applies to unix targets only, as Windows does not have an appropriate alternative.
Treat `-` as an alias to `/dev/stdin` This applies to unix targets only, as Windows does not have an appropriate alternative.
Previously, wasmtime-cli checked the module to be loaded is precompiled or not, by pre-opening the given file path to check if the "\x7FELF" header exists. This commit moves this branch into the `Module::from_trusted_file`, which is only invoked with `--allow-precompiled` flag on CLI. The initial motivation of the commit is, feeding a module to wasmtime from piped inputs, is blocked by the pre-opening of the module. The `Module::from_trusted_file`, assumes the --allow-precompiled flag so there is no piped inputs, happily mmap-ing the module to test if the header exists. If --allow-precompiled is not supplied, the existing `Module::from_file` will be used, without the additional header check as the precompiled modules are intentionally not allowed on piped inputs for security measures. One caveat of this approach is that the user may be confused if he or she tries to execute a precompiled module without --allow-precompiled, as wasmtime shows an 'input bytes aren't valid utf-8' error, not directly getting what's going wrong. So this commit includes a hack-ish workaround for this: If the error on `Module::new` ends with "input bytes aren't valid utf-8" strings, show a simple note to the standard error stream. This might be a small hiccup on preparing i18n or changing error format on the `wat` crate, but I feel comfortable (yet) this strategy because the `wat` crate includes tests for the error messages, so one would notice the breakage if the error message have changed. Thanks to @jameysharp for suggesting this idea with a detailed guidance.
7e054da
to
2730da3
Compare
This is not discussed in the issue tracker, but on this Zulip thread.
Previously load_module opened a file on
path
to check if the module is precompiled, then closes before constructing the Module from it. This is fine on normal files, but on piped files(like /dev/stdin), is problematic as the header read is not visible on the next open.So, this set of commits inserts a new procedure, before pre-opening to read the header, to detect if the file pointed by the
path
is a pipe(is_fifo
of FileTypeExt), then directly loads the module from the path without checking if the modules is precompiled.This change is unix-only, and intentionally does not support loading precompiled modules from pipes, because it should not be used that way. An additional warning message is printed in such situations to prevent users from being confused.
The Zulip thread contains an additional concern by me, about implicitly allowing accsesing
/dev/tty
on-
inputs, which is not considered here.*Edited to describe the current procedure to detect piped files.