Skip to content

Commit

Permalink
Add wat, move section on wasm typesystem
Browse files Browse the repository at this point in the history
  • Loading branch information
Manishearth committed Aug 28, 2024
1 parent 7eeac9e commit 0d21fe4
Showing 1 changed file with 98 additions and 16 deletions.
114 changes: 98 additions & 16 deletions docs/wasm_abi_quirks.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,90 @@ The Rust Wasm ABI is rather strange, and does not follow [tool conventions]. The
Diplomat has issue [#661] tracking the new conventions.


## Wasm parameter types

While at the Rust level and the LLVM IR level there are multiple different parameter types, Wasm itself only accepts two parameter/return types: `i32` and `i64`. The JS-Wasm interface maps `Number` to `i32` and `BigInt` to `i64`.

This means that a function accepting a single `u8` parameter will still show up as accepting an `i32` in the WAT:

```rust
#[no_mangle]
pub extern "C" fn inout(x: u8) -> u8 { 1 }
```

This has LLVM IR:

```llvm
define dso_local zeroext i8 @inout(i8 zeroext %x) unnamed_addr #0 { ... }
```

And Wasm/WAT:

```wat
(type $t0 (func (param i32) (result i32)))
(func $inout (export "inout") (type $t0) (param $p0 i32) (result i32) ...)
```


Our current code does not correctly handle large `u32`s, which will get turned into negative numbers across FFI when passed as parameters/return types.

We may additionally have gaps in our current code around `u64`s; since integers do not implicitly convert to `BigInt` (and if they do, we may be doing so erroneously). In particular, we need to check if padding code handles this correctly.



## Return values

This is not a Rust-specific quirk, but rather how Wasm works in non-multivalue mode. Switching to [multivalue], possible in Rust with `-C target-feature=+multivalue` will get past this, but Diplomat would need to be updated to produce multivalue-capable bindings.


All Wasm functions have a signature that looks like `fn(integer, integer, integer, ...) -> integer`. There are no non-integer types Wasm FFI at the lowest level: Pointers are integer indices into the wasm memory buffer, slices are a pair of integers, and structs are a bunch of integers (more on this later).
All Wasm functions have a signature that looks like `fn(integer, integer, integer, ...) -> integer` (where the return type is optional). There are no non-integer types Wasm FFI at the lowest level: Pointers are integer indices into the wasm memory buffer, slices are a pair of integers, and structs are a bunch of integers (more on this later). As mentioned in the previous section, Wasm only really distinguishes between `i32` and `i64` here, everything else is converted.

As might be clear from this general signature, Wasm is only capable of returning _scalars_ over FFI (from the foreign language to JS). A scalar is something equivalent to a single integral primitive, which in wasm becomes all integer types[^1], booleans, `char`s, and pointers. Aggregates transitively containing multiple integral primitives, like structs with more than one field and slices, are not scalars. Aggregates containing a single scalar value are equivalent to that contained scalar in all FFI matters.
As might be clear from this general signature, Wasm is only capable of returning _scalars_ over FFI (from the foreign language to JS). A scalar is something equivalent to a single integral primitive, which in wasm becomes all integer types[^1], booleans, `char`s, and pointers. Aggregates transitively containing multiple integral primitives, like structs with more than one field and slices, are not scalars. Aggregates containing a single scalar value are equivalent to that contained scalar in all FFI matters[^2].


This means that there is no way to have a signature like `fn(..) -> (integer, integer)`, for example when returning a slice or a two-field struct across FFI.
This means that there is no way to have a signature like `fn(..) -> (integer, integer)`, for example when returning a slice or a two-field struct across FFI. Instead, for a struct like this:

```rust
// size 8, align 4
pub struct Big {
a: u8, // size 1, offset 0
b: u16, // size 2, offset 2
c: u64, // size 8, offset 8
}
#[no_mangle]
pub extern "C" fn returns_big(arg1: u8, arg2: u8) -> Big { ... }
```

Instead, in this case, Wasm uses an "outparam" solution for this. Sufficient space for the struct must be allocated on the Wasm heap, and a pointer to this space should be passed in as the _last_ parameter for this function. Once the function is called the value can be read back.

This code generates the LLVM IR:

```llvm
; produced by rustc with -Zwasm-c-abi=legacy
define dso_local %Big @returns_big(i8 zeroext %arg1, i8 zeroext %arg2) unnamed_addr #0 { ... }
```

and the Wasm/WAT:

```wat
(type $t0 (func (param i32 i32 i32)))
(func $returns_big (export "returns_big") (type $t0) (param $p0 i32) (param $p1 i32) (param $p2 i32) ...)
```

Note that in the Wasm, there is no return value: instead there is an additional parameter. When calling this from JS, this will be called as `wasm.returns_big(arg1, arg2, outParam)`.

In Diplomat this is typically managed by DiplomatReceiveBuf, but in raw pseudocode the thing that needs to be done is roughly:

```js
// Allocate space for the struct with the right size/alignment
let structAlloc = wasm.diplomat_alloc(size, align);
let structAlloc = wasm.diplomat_alloc(8, 4);

wasm.functionThatReturnsStruct(arg1, arg2, arg3, structAlloc);
wasm.returns_big(arg1, arg2, structAlloc);

// Read the fields from wasm memory (ptrRead reads from wasm memory)
let field0 = ptrRead(structAlloc + 0, field0Size);
let field1 = ptrRead(structAlloc + offsetForField1, field1Size);
let field2 = ptrRead(structAlloc + offsetForField2, field2Size);
// Read the fields from wasm memory (ptrRead reads from wasm memory, given a memory location and a size)
let field0 = ptrRead(structAlloc + 0, 1);
let field1 = ptrRead(structAlloc + 2, 2);
let field2 = ptrRead(structAlloc + 8, 8);

// Clean up
wasm.diplomat_free(structAlloc);
Expand Down Expand Up @@ -66,10 +124,19 @@ extern "C" fn takes_struct(s: MyStruct) {...}
The LLVM IR to get this result in Wasm looks something like:

```llvm
; produced by rustc with -Zwasm-c-abi=legacy
; produced by rustc with (default) -Zwasm-c-abi=legacy
define dso_local void @takes_struct(i8 %x.0, i32 %x.1) unnamed_addr #0 { ... }
```

And the Wasm would look like:

```wat
(type $t0 (func (param i32 i32)))
(func $takes_struct (export "takes_struct") (type $t0) (param $p0 i32) (param $p1 i32) ...)
```

(Note that all parameter types are turned into `i32` in the Wasm/WAT, see the section above on "Wasm parameter types" for more)

This would work through layers of indirection; e.g. a struct with `MyStruct` as its only field would get passed similarly. The idea is that you pick out each scalar value transitively contained in the struct and pass them as arguments one by one.

Another is "indirect" passing, which works similarly to the return value thing where the struct is passed as a pointer. It can be allocated on the heap or the stack, however being able to manipulate the Wasm stack from JS is tricky so it's best to just heap-allocate.
Expand Down Expand Up @@ -100,6 +167,13 @@ define dso_local void @takes_struct(ptr noundef byval(%struct.MyStruct) align 4
define dso_local void @takes_struct(ptr byval([8 x i8]) align 4 %x) unnamed_addr #0 { ... }
```

And the Wasm would look like:

```wat
(type $t0 (func (param i32)))
(func $takes_struct (export "takes_struct") (type $t0) (param $p0 i32) ...)
```

The [tool conventions] ask for "direct" passing for scalars (including structs that transitively contain a single scalar), and "indirect" for everything else.

### What Rust does
Expand Down Expand Up @@ -144,6 +218,13 @@ The LLVM IR looks something like:
define dso_local void @big(%Big %0) unnamed_addr #0 { ... }
```

And the Wasm would look like:

```wat
(type $t0 (func (param i32 i32 i32 i32 i32 i64)))
(func $big (export "big") (type $t0) (param $p0 i32) (param $p1 i32) (param $p2 i32) (param $p3 i32) (param $p4 i32) (param $p5 i64) ...)
```

And it gets invoked as `wasm.big(a, 0, b, 0, 0, c)`. Even though there are four bytes of padding for the second padding segment, it's treated as `i16`-padding, which means only two fields are needed. The padding type appears to just be the alignment of the preceding field.

### Nested structs
Expand Down Expand Up @@ -222,12 +303,12 @@ with LLVM IR:
define dso_local void @opt(%"DiplomatResult<Inner, ()>" %0) unnamed_addr #0 { ... }
```

## u64

While all "normal" scalar parameters in wasm are regular Number types, `u64` translates to a `BigInt`.

We may have gaps in our current code around this; since integers do not implicitly convert to `BigInt` (and if they do, we may be doing so erroneously). In particular, we need to check if padding code handles this correctly.
And the Wasm would look like:

```wat
(type $t0 (func (param i32 i32 i32 i32 i32 i32)))
(func $opt (export "opt") (type $t0) (param $p0 i32) (param $p1 i32) (param $p2 i32) (param $p3 i32) (param $p4 i32) (param $p5 i32) ...)
```



Expand All @@ -238,4 +319,5 @@ We may have gaps in our current code around this; since integers do not implicit
[multivalue]: https://hacks.mozilla.org/2019/11/multi-value-all-the-wasm/


[^1]: Potentially with the exception of u64 in some cases? I haven't investigated this. Wasm gets weird around BigInt stuff.
[^1]: Potentially with the exception of u64 in some cases? I haven't investigated this. Wasm gets weird around BigInt stuff.
[^2]: Technically it is possible to break this equality by overriding the alignment of the struct, introducing padding. Diplomat currently doesn't handle this, and we may try to forbid using the JS backend with alignment-overridden structs

0 comments on commit 0d21fe4

Please sign in to comment.