Implement loading modules from piped files #5035

cr0sh · 2022-10-09T11:08:52Z

This is not discussed in the issue tracker, but on this Zulip thread.

Previously load_module opened a file on path to check if the module is precompiled, then closes before constructing the Module from it. This is fine on normal files, but on piped files(like /dev/stdin), is problematic as the header read is not visible on the next open.

So, this set of commits inserts a new procedure, before pre-opening to read the header, to detect if the file pointed by the path is a pipe(is_fifo of FileTypeExt), then directly loads the module from the path without checking if the modules is precompiled.

This change is unix-only, and intentionally does not support loading precompiled modules from pipes, because it should not be used that way. An additional warning message is printed in such situations to prevent users from being confused.

The Zulip thread contains an additional concern by me, about implicitly allowing accsesing /dev/tty on - inputs, which is not considered here.

*Edited to describe the current procedure to detect piped files.

cr0sh · 2022-10-09T11:10:25Z

It seems like I cannot assign a reviewer for the PR(the template indicates me to do so), so not assigning.

bjorn3 · 2022-10-09T11:22:08Z

src/commands/run.rs

+            f.read_to_end(&mut buf)?;
+            let mmap_vec = MmapVec::from_slice(&buf)?;
+            Ok::<_, anyhow::Error>(mmap_vec)
+        })?;


This introduces new copies of the data, slowing down startup. Previously Module::from_file would mmap the file once and then use this mmap'ed version directly. Now it will mmap it once (or read to a vec and then copy it to a new memory region in MmapVec::from_slice) and then copy it to a new memory region again in Module::new (through MmapVec::from_slice).

Definitely right. To be honest, I thought directly writing Mmap/MmapVec with looped f.read() is more efficient but feeled like a premature optimization, so chose this implementation for brevity. (The main reason for this is the File::read_to_end is not compatible with MmapVec.)

If startup performance matters for piped inputs too, I'll fix this into like

loop { let buf = [0u8; 4<<10]; match f.read(&mut buf) { Ok(0) => break, Ok(n) => /* copy `n` bytes to Mmap and increment the offset */ todo!(), Err(e) => return Err(e), } }

But again I'm not sure this is worth it.

This will still have an extra copy in case the mmap succeeded.

OK I didn't understand what you were saying. 😅 Now I see.
I'm afraid that this is not able to be trivially fixed, as there's no public API on Module to pass MmapVec directly into a constructor?

Module::from_binary seems to be the 'generic' catch-all for many Module constructors but I don't have an idea what it does...

Nevermind, you said about the case when the mmap suceeded, so trying to mmap and if it succeeds then discarding the mmap and using Module::from_path is OK. Is that you're thinking?

That may work.

cr0sh · 2022-10-09T14:47:11Z

Piping tests now run on unix platforms only(as the - substitiution applies only on unix too). Double buffering issue is resolved.
What do you think about implicitly allowing access to /dev/tty on wasmtime - as a special case?

sunfishcode · 2022-10-10T15:45:56Z

Cargo.toml

@@ -30,6 +30,7 @@ wasmtime-wast = { workspace = true }
 wasmtime-wasi = { workspace = true }
 wasmtime-wasi-crypto = { workspace = true, optional = true }
 wasmtime-wasi-nn = { workspace = true, optional = true }
+wasmtime-runtime = { workspace = true }


The Wasmtime CLI serves as an example of how to use Wasmtime's API, and in general we don't want users of the API depending on internal crates like wasmtime-runtime. Would it be possible to factor out this code for opening and loading a module either from deserialized form or a plain module into a utility function in the wasmtime crate, so that the CLI code can call that rather than doing this work itself?

This was only needed for MmapVec so I'll remove on the next commit. (see below.)

After re-reading my previous commits and review comments here, I realized this reply is not possible at all. MmapVec cannot be avoided because load_module should do both "header check" and "copyless mmap" in one path. I think your suggestion is truly valid but please kindly understand that I don't know much about this project (yet) to implement it rapidly so I would come back with results after some time with some investigation and tryouts.

Sounds good. And thanks for working on this! I agree that running modules from stdin is an interesting idea.

sunfishcode · 2022-10-10T16:00:52Z

src/commands/run.rs

+        // to store the whole bytes in a memory.
+        let mmap_vec = MmapVec::from_file(path).or_else(|_| {
+            // Mmap-ing this much would not eat system memory thanks for paging
+            const MAX_READ_FROM_STREAM: usize = 1 << 30; // TODO: make this configurable with environment variables?


Arbitrary limitations and environment variables are something we'd like to avoid if possible. The users of mmap_vec don't care if they have an MMapVec or a plain Vec<u8>, so in the case where the mmap fails, would it work to just call std::fs::read to read the file? That would require reorganizing this code to avoid using MmapVec as the type to unify the two options. Something like this:

let mmap_vec; let read; let bytes: &[u8] = if let Ok(mapped) = MmapVec::from_file(Path::new(path)) { mmap_vec = mapped; mmap_vec.as_ref() } else { read = std::fs::read(path)?; read.as_ref() };

Yeah I misinterpreted @bjorn3 's first review so I'll roll back into previous implementation which just uses Read::read_to_end and Vec<u8>, and prevent redundant allocation when a module is from a regular file. This may be sub-optimal for irregular file case but I think it's OK because no one wants to do hyperfine 'cat foo.wasm | wasmtime -'?

cr0sh · 2022-10-23T14:36:59Z

I simplified the strategy to detect piped files(directly invokes is_fifo on Unix) and fixed tests incorrectly set up.

bjorn3 · 2022-10-24T12:55:25Z

src/commands/run.rs

        if cfg!(unix) {
+            #[cfg(unix)]


Suggested change

if cfg!(unix) {

#[cfg(unix)]

#[cfg(unix)]

{

What a genius! 🤣

jameysharp · 2022-10-25T21:01:19Z

Here's the strategy I'd suggest:

Add a new method to wasmtime::Module that accepts both precompiled binaries and wasm/wat files, and uses mmap. I'd call it from_trusted_file but I'm open to suggestions. It should probably be unsafe, like Module::deserialize_file, for the same reasons.

In the CLI, if allow_precompiled is true, then call Module::from_trusted_file; otherwise, call Module::from_file just like it does today. Don't inspect the bytes of the file at all in the CLI, but put that logic in the wasmtime crate instead. I think that addresses @sunfishcode's comment on keeping the CLI as a simple demonstration of how to use the wasmtime public API, and improves on the current state where there's this magic ELF-header check in the CLI.

In your motivating use case (curl | wasmtime), users shouldn't be passing --allow-precompiled, so the CLI would call Module::from_file, resulting in a single fs::read that consumes the piped input exactly once, which solves the original bug you encountered.

This new method should read the file using MmapVec::from_file. Only the OS knows exactly which file types it's capable of mapping into memory, so this shouldn't check whether the file is a FIFO or anything along those lines; just let the syscall fail if someone uses --allow-precompiled with a pipe.

Once you have an MmapVec, you can check whether it begins with b"\x7fELF". If so, pass it to SerializedModule::from_mmap; otherwise, pass it to Module::new. Neither function copies the bytes it's given, addressing @bjorn3's concern.

I think the only downside to this plan is that the error message from the CLI isn't as friendly if someone tries to run a precompiled artifact without passing --allow-precompiled. We could make Module::build_artifacts return a more helpful error if the bytes it receives start with an ELF header, instead of just reporting that parsing failed.

One advantage to this approach is that wasmtime --allow-precompiled becomes potentially faster for .wat and .wasm files, because it'll use mmap for those cases too.

cr0sh · 2022-10-26T14:54:55Z

@jameysharp Thanks for your detailed guidance! I didn't think fixing this would not be hard this way and admittedly I couldn't do some sort of refactoring the project(and even adding a new public API for it) never saw before, tempting myself to bring somewhat hackish way. I'll try to reimplement this way.

I'd call it from_trusted_file but I'm open to suggestions.

This looks good.

I think the only downside to this plan is that the error message from the CLI isn't as friendly if someone tries to run a precompiled artifact without passing --allow-precompiled. We could make Module::build_artifacts return a more helpful error if the bytes it receives start with an ELF header, instead of just reporting that parsing failed.

I'm not sure that I can bring an excellent solution for this; just will do my best.

To recap I leave a tabular summary for your suggestion:

	no `--allow-precompiled` (`Module::from_file`)	`--allow-precompiled` (`Module::from_trusted_file`)
precompiled	regular: opens OK and fails on some inner procedure but should be rejected via an additional check? TODO: better error handling fifo: same	regular: opens OK w/ mmap fifo: mmap fails (intentionally left broken)
`.wasm`/`.wat`	regular: opens OK fifo: opens OK (major change on this PR)	regular: opens OK w/mmap fifo: same

jameysharp · 2022-10-26T16:19:04Z

should be rejected via an additional check

Right, the key here is that a file that begins with the 4-byte ELF magic is not valid for .wat or .wasm format and will be rejected by those parsers. None of the characters a .wat file can start with (which I think are whitespace, (, or ;) include \x7f. And the binary .wasm format has its own different 4-byte magic (\0asm).

So from_file already rejects precompiled modules, but it does it by reporting a generic parse failure, which is not super friendly. We can be nicer to users by specifically checking for this case. But if you have any trouble with that, I don't think that has to be in this PR.

The existing check that the CLI does for the ELF magic serves two purposes: one is to provide a helpful error message, and the other is to decide whether to use deserialize_file or from_file. So one way to look at the approach I've outlined is that it does the same things, but in a different order.

I couldn't do some sort of refactoring the project(and even adding a new public API for it) never saw before, tempting myself to bring somewhat hackish way.

Absolutely! I had to stare at the existing code for a long time, and think about a bunch of options that don't work, to come up with this suggestion. So of course I don't expect somebody new to the project to think of it.

I think the project will be better off with this additional API. In my opinion, it would be an improvement even if it didn't help with reading from pipes: the CLI shouldn't need to know about ELF magic. But I also think having support for reading from pipes is a great idea, so I like that this fixes multiple issues at the same time.

cr0sh · 2022-11-29T16:11:00Z

Rebased and implemented the @jameysharp 's guidance. (I'll move to a new pull request as the topic has changed, so please leave this PR not reviewed)

This applies to unix targets only, as Windows does not have an appropriate alternative.

Treat `-` as an alias to `/dev/stdin` This applies to unix targets only, as Windows does not have an appropriate alternative.

@jameysharp

Previously, wasmtime-cli checked the module to be loaded is precompiled or not, by pre-opening the given file path to check if the "\x7FELF" header exists. This commit moves this branch into the `Module::from_trusted_file`, which is only invoked with `--allow-precompiled` flag on CLI. The initial motivation of the commit is, feeding a module to wasmtime from piped inputs, is blocked by the pre-opening of the module. The `Module::from_trusted_file`, assumes the --allow-precompiled flag so there is no piped inputs, happily mmap-ing the module to test if the header exists. If --allow-precompiled is not supplied, the existing `Module::from_file` will be used, without the additional header check as the precompiled modules are intentionally not allowed on piped inputs for security measures. One caveat of this approach is that the user may be confused if he or she tries to execute a precompiled module without --allow-precompiled, as wasmtime shows an 'input bytes aren't valid utf-8' error, not directly getting what's going wrong. So this commit includes a hack-ish workaround for this: If the error on `Module::new` ends with "input bytes aren't valid utf-8" strings, show a simple note to the standard error stream. This might be a small hiccup on preparing i18n or changing error format on the `wat` crate, but I feel comfortable (yet) this strategy because the `wat` crate includes tests for the error messages, so one would notice the breakage if the error message have changed. Thanks to @jameysharp for suggesting this idea with a detailed guidance.

cr0sh changed the title ~~Run piped modules~~ Implement loading modules from piped files Oct 9, 2022

bjorn3 reviewed Oct 9, 2022

View reviewed changes

sunfishcode reviewed Oct 10, 2022

View reviewed changes

bjorn3 reviewed Oct 24, 2022

View reviewed changes

cr0sh force-pushed the run-piped-modules branch from 5742b5a to f730953 Compare November 29, 2022 16:09

cr0sh force-pushed the run-piped-modules branch from f730953 to 7e054da Compare November 29, 2022 16:18

cr0sh added 3 commits November 30, 2022 01:23

Treat - as an alias to /dev/stdin

675dded

This applies to unix targets only, as Windows does not have an appropriate alternative.

Add tests for piped modules from stdin

1916543

Treat `-` as an alias to `/dev/stdin` This applies to unix targets only, as Windows does not have an appropriate alternative.

cr0sh force-pushed the run-piped-modules branch from 7e054da to 2730da3 Compare November 29, 2022 16:23

cr0sh closed this Nov 29, 2022

cr0sh mentioned this pull request Nov 29, 2022

Move precompiled module detection into wasmtime #5342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement loading modules from piped files #5035

Implement loading modules from piped files #5035

cr0sh commented Oct 9, 2022 •

edited

Loading

cr0sh commented Oct 9, 2022

bjorn3 Oct 9, 2022

cr0sh Oct 9, 2022 •

edited

Loading

bjorn3 Oct 9, 2022

cr0sh Oct 9, 2022 •

edited

Loading

cr0sh Oct 9, 2022 •

edited

Loading

bjorn3 Oct 10, 2022

cr0sh commented Oct 9, 2022

sunfishcode Oct 10, 2022

cr0sh Oct 11, 2022

cr0sh Oct 12, 2022 •

edited

Loading

sunfishcode Oct 12, 2022

sunfishcode Oct 10, 2022 •

edited

Loading

cr0sh Oct 11, 2022

cr0sh commented Oct 23, 2022

bjorn3 Oct 24, 2022

cr0sh Oct 24, 2022

jameysharp commented Oct 25, 2022

cr0sh commented Oct 26, 2022

jameysharp commented Oct 26, 2022

cr0sh commented Nov 29, 2022

Implement loading modules from piped files #5035

Implement loading modules from piped files #5035

Conversation

cr0sh commented Oct 9, 2022 • edited Loading

cr0sh commented Oct 9, 2022

Choose a reason for hiding this comment

cr0sh Oct 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cr0sh Oct 9, 2022 • edited Loading

Choose a reason for hiding this comment

cr0sh Oct 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cr0sh commented Oct 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cr0sh Oct 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunfishcode Oct 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cr0sh commented Oct 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameysharp commented Oct 25, 2022

cr0sh commented Oct 26, 2022

jameysharp commented Oct 26, 2022

cr0sh commented Nov 29, 2022

cr0sh commented Oct 9, 2022 •

edited

Loading

cr0sh Oct 9, 2022 •

edited

Loading

cr0sh Oct 9, 2022 •

edited

Loading

cr0sh Oct 9, 2022 •

edited

Loading

cr0sh Oct 12, 2022 •

edited

Loading

sunfishcode Oct 10, 2022 •

edited

Loading