Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nontrapping variants of certain instructions? #986

Open
juj opened this issue Feb 13, 2017 · 18 comments
Open

Nontrapping variants of certain instructions? #986

juj opened this issue Feb 13, 2017 · 18 comments
Labels

Comments

@juj
Copy link

juj commented Feb 13, 2017

In Emscripten, having the ability to build libraries that avoid trapping in certain common situations is an important ability (emscripten-core/emscripten#4625 for some background). Currently integer division and modulus operations as well as float to int conversion can cause traps to occur.

Emscripten features a build mode which adds checks for the trap conditions before doing the potentially trapping operation. However this has both a runtime performance impact and a generated code size impact. The performance impact is currently showing as a few % hotspot in compiled Unreal Engine 4 pages, especially in the float->int conversion, and the engine has a relatively large number of locations in its third party libraries where they benignly do casting of out of range floats to integers, which would trap, and changing all the code locations is not feasible.

It would be useful to have non-trapping variants of these instructions in the language itself, so a Wasm implementation could help efficiently implement this support. (This is btw the only so far noticed performance regression when migrating from asm.js to wasm)

@RyanLamansky
Copy link

The most straightforward solution is to add 12 more opcodes (by my count) for non-trapping versions of those instructions. It puts WASM's total opcode count to 184 out of the 256 limit imposed by the current encoding process; I don't believe this process is sustainable long-term (I'd prefer to see opcodes written as varuint32 like so many other things in the spec, and remappable to resist bloat), but until it's actually blocking a planned expansion it doesn't need to be addressed.

Another approach is to add an "immediate" to the trapping instructions to describe the desired outcome. Breaking change, but it's still early enough that it can be survived. Makes decoding slightly more complicated and adds bloat.

My least favorite idea is to add some kind of context opcode to trigger opcodes within the context to not trap. Most complicated to decode and requires some effort to spec out.

@jfbastien
Copy link
Member

@Kardax your discussion is covered here: #881
If you think that's the wrong conclusion I think it's worth reopening that bug, so that we keep context all in one place.

@RyanLamansky
Copy link

@jfbastien Ah, indeed, I had forgotten about that one. I added my thoughts there. I'm curious how the WASM core team plans to address this very worthwhile request. Until today it seemed like the MVP opcode set was locked down, but a performance regression vs asm.js is discouraging.

@sunfishcode
Copy link
Member

sunfishcode commented Feb 17, 2017

Concerning float->int trunc operations, Binaryen's asm2wasm's code currently appears to implement a non-trapping conversion by doing:

  • if the input is f32, convert if to f64
  • call an imported function (f64-to-int) implemented in JS to do the conversion

Is this what the measurements above are using? If so, it seems like much more efficient alternatives should be possible.

Concerning div/rem, are there actually cases where the div/rem traps break assumptions in practice? x86 traps on these too, so it seems unlikely that any reasonably portable C/C++ code would assume that they're silent. I know that Emscripten provides non-trapping versions of these in order to provide maximal asm.js compatibility, but I'm wondering if we know how important this is in practice.

Of course, support for non-C/C++ languages is also important, though languages have a significant variety of behaviors (throw an exception, overflow to a bignum, return a language-specific value), so it may want more comprehensive customization mechanisms.

@pipcet
Copy link

pipcet commented Feb 17, 2017

I think there is a similar problem that's less easy to solve by hacking around it: math/trigonometric functions. On modern CPUs, these are fast enough that calling them via an import significantly deteriorates performance (I believe this is another asm.js -> wasm performance issue), but there are too many of them to add opcodes for all of them.

How about adding a kind of "inlinable import" function, which is known at module validation time, but identified with a string in addition to being defined in terms of wasm opcodes.? That would significantly reduce pressure on the opcode space, since uncommon operations would need only a string identifier; it would allow non-trapping variants of potentially trapping operations, trigonometric functions, and atomics to be introduced with very little overhead for fast implementations, while all a slow implementation would need to do is to fall back to an ordinary wasm function call in these cases.

Each such import would be accompanied by a fallback wasm function, with the understanding that a native implementation can be used instead. For example, atomic operations (once added and given string identifiers) would have non-atomic fallback wasm code, and execute just fine on older wasm implementations.

So, in summary, add a kind of function that has both a string identifier and wasm code, both known at compile time, and allow implementations to inline their own, native, implementation instead of calling the included wasm code.

The actual change to the file format would be very limited. The important bit is the understanding that an implementation can inline its own version of a function without looking at the fallback implementation. What would be a continuing task would be to manage a register of assigned function names, but a two-level name space and borrowing names from established libraries should be enough to prevent any actual conflicts.

@pipcet
Copy link

pipcet commented Feb 28, 2017

I'd like to explore this idea further, but fear I haven't put it very well.

My proposal is to add a new kind of WebAssembly function, called a standard library function, which, like a regular function, has a function body in the wasm module, but, like an import, also has a module name and export name. These names would be used at compilation time to check whether a more efficient (or otherwise preferrable) built-in definition exists for the given name tuple; if so, the compiler would turn all direct calls of that function into native code according to the built-in definition, and ignore the function body in the module. If there is no built-in definition, the function body in the module is used instead.

Since this happens at compile time and the namespaces don't necessarily match those of the import section, I propose adding a new section for standard library functions; this would also have the effect of assigning the first function indices to standard library functions, resulting in a two-byte sequence for invoking them.

Standard library functions could be:

  • no-trap versions of standard opcodes
  • atomic operations
  • math/trigonometric functions
  • reduced-precision math functions in particular
  • memory/string functions such as memcpy() or strlen()

As there would be wasm definitions available for all those functions, we could add standard library functions at will without making a breaking change: the code would work just fine, if a little more slowly, on MVP-level compilers: instead of encountering an unknown opcode, we would see a call to a standard library function that we cannot make a substitution for and use the wasm function body instead.

@sunfishcode
Copy link
Member

@pipcet It's an interesting idea. I'd encourage you to file a new issue to explore it, as it's significantly greater in scope, and I'd like to also continue the original discussion thread here, to better understand the problems people are currently facing on this specific issue.

That said, another approach that's been discussed is that user libraries could do feature testing and rewrite wasm modules during startup time. See this page for some discussion. If you file a new issue, it's worth examining the pros and cons relative to that kind of approach.

And for reduced-precision math functions specifically, another approach is to just have applications bundle in their own math libraries. WebAssembly already includes all the functionality needed to build high-speed implementations of sin, cos, and others. And the advantages of this approach are that (a) applications don't have to worry about varying results across browsers or over time, (b) we don't have to specify what "reduced precision" means.

@binji
Copy link
Member

binji commented Feb 28, 2017

@pipcet as an FYI, your idea seems similar to LLVM's intrinsic functions.

@jfbastien
Copy link
Member

Is someone willing to champion this?

@jgravelle-google
Copy link

Interestingly enough I've been running in to an issue in the LLVM wasm backend, initially reported in Binaryen (WebAssembly/binaryen#983). It turns out to be the result of a backend-agnostic LLVM IR pass, SimplifyCFG, believing it's safe to speculatively evaluate a conversion from float to int.
LLVM IR's float to int instruction, fptoui (http://llvm.org/docs/LangRef.html#fptoui-to-instruction) has the semantics that converting a float value that isn't representable in the destination int type has undefined results. LLVM has separate modelings of undefined results and undefined behavior. Undefined results means that the bit pattern we get back is arbitrary, but if we don't read the value then nothing undefined occurs.
So we go from C program with no UB, to LLVM IR with no UB, to wasm module with UB. So the correct place to avoid UB is in the lowering from LLVM IR to wasm.
There's a handful of ways to do that, cheapest of which is adding bounds checks for unrepresentable values, which increases module size per conversion, and is slower (+70bytes per fptoui, +70% execution time in v8, +340% slowdown in SM at the moment. Could be improved with less-first-draft code gen).
For the C to wasm case, we can create an LLVM intrinsic, (llvm.wasm.trapping_fptoui or something) to model the trapping behavior correctly in the IR and use native wasm trunc_u instructions, but in order to support arbitrary IR we still need to be able to generate bounds checks.

Not sure if we should change the semantics of wasm in order to more easily support this, but I figured it's worth sharing.

@jfbastien
Copy link
Member

@jgravelle-google kinda sounds similar to the nsw nuw flags (not quite, but maybe a similar solution would work).

@sunfishcode
Copy link
Member

A current convention in wasm is: _s and _u things trap whenever they can't represent their result.

A plausible next step is to introduce _s:sat and _u:sat, meaning signed saturating and unsigned saturating, respectively. Specifically:

  • return the maximum value on positive overflow, and the minimum value on negative overflow
  • return 0 in "invalid" cases (zero divided by zero, NaN converted to int, etc.)

With this, we could define i32.convert_s:sat/f32, which would have the semantics of a Java cast from float to int for example, and it would be usable for a speculated fptosi in LLVM for example.

We wouldn't need to add saturating versions of everything right away; would could just add things as we need them. At first, that may just be i32.convert_s:sat/f32, i32.convert_u:sat/f32, and their i64 and f64 counterparts, and the SIMD proposal.

I'm open to different naming or encoding specifics, though it is quite cute how piggy-backing on _s reflects the underlying relationships between the operation, the signedness interpretation, and the overflow strategy, and how it generalizes to other hypothetical overflow strategies such as _u:wrap or _s:intmin and so on. In the encoding, a prefix-byte (which is perhaps what @jfbastien is suggesting), could similarly reflect.

@juj
Copy link
Author

juj commented May 18, 2017

Several developers have reported this showing up as a problem. Talking with them, it is interesting how this seems to be one of those features that trapping instructions seem to be useful for debug builds, but for release builds, there is a preference towards shipping code with nontrapping instructions. Here's some anecdotal stories of developers who received crashes from these types of traps:

  • in one case, there was a JSON parser code that had a function JsonGetNumber(json *object, const char *key); where the function would return a qnan if the key was not present in the json object. Caller was doing something like this:

      bool animationEnabled = JsonGetBool(object, "animate");
      int animationType = (int)JsonGetNumber(object, "animationType"); // trap
      .... later:
      if (animationEnabled)
      {
          switch(animationType) ...
      }
    

there animationType was nan, but it was never read. It is superficially better code to have the trapping line read int animationType = animationEnabled ? (int)JsonGetNumber(object, "animationType") : 0;, but it didn't matter much either way.

  • in another case, code was checking whether a double had a fractional part:

      int asInteger = (int)someDouble; // trap
      if ((double)asInteger == someDouble)
      {
      	// someDouble has no fractional part, do something
      }
    

this function was getting called with someDouble == inf, and the outcome had been as desired, i.e. in native execution, the if branch was not taken on infs or nans.

  • in a third case, code was computing the length of an audio clip after its pitch had changed:

      uint64_t adjustedLengthSamples = (uint64_t)(lengthSamples / pitchAsDouble);
    

where pitchAsDouble had been set to 0. It turns out that if pitchAsDouble was 0, then adjustedLengthSamples did not end up being read later.

In all of these cases, developers did find a way to improve the code, but it can be argued whether the improvements were more pedantic in nature or not. The "garbage in, garbage out" idiom is quite prevalent in C code, and while it can be useful to trap in these types of scenarios at development time, crashing execution to a trap in release code was felt to be quite a drastic and strong behavior from WebAssembly, and developers would rather have a "try to continue no matter what" stance for execution.

This suggests that developers would prefer to release code with nontrapping instructions, but develop with trapping instructions, so having both variants seem to be useful for WebAssembly?

@rossberg
Copy link
Member

@juj, wouldn't it be trivial for producers to generate such debugging code themselves, on top of non-trapping instructions?

@kmiller68
Copy link
Contributor

I'm somewhat opposed to changing the semantics of instructions we already shipped. I think it sets a bad precedent to change the semantics of instructions over what I would consider ergonomics. My position on this could be swayed, however, so I think it would be worth talking about at the meeting.

@rossberg
Copy link
Member

rossberg commented May 19, 2017

@kmiller68, note that such a change only removes failure cases. In that sense it is almost a conservative extension of the existing semantics.

@MikeHolman
Copy link
Member

No one can really be relying on the trapping behavior, so it shouldn't be too dangerous to change. I'm not opposed to adding a new non-trapping variant either.

Also, these cases demonstrate a (minor) perf benefit I was wondering about. We have to go out of our way to disallow dce for these ops, which leads to us always doing checks even if the result of the div, etc. isn't used.

@juj
Copy link
Author

juj commented Jul 19, 2017

@juj, wouldn't it be trivial for producers to generate such debugging code themselves, on top of non-trapping instructions?

Yeah, that's true, if one had nontrapping instructions, adding trapping variants in debug build can be done in offline compilers.

I'm somewhat opposed to changing the semantics of instructions we already shipped.

Agreed, we wouldn't want to change existing instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

11 participants