rsms

Introduction to WebAssembly

This is not a logotypeWebAssembly, or WASM for short, is a new technology for running portable programs in a safe and efficient manner primarily aimed at the web platform. Similarly to ASM.js, WASM aims at a low level-of abstraction suitable as an intermediate representation of a higher-level program — i.e. WebAssembly code is intended to be generated by compilers rather than being written by humans. The W3C community group includes representatives from the largest web-browser companies, including Google, Microsoft, Apple and Mozilla making this whole thing rather exciting.

If you’re reading this chances are you’re already familiar with WASM to some extent. If you aren’t, this would be a good time to check out webassembly.org. At the time of publishing this article, WebAssembly has just reached the Browser Preview milestone, meaning that version 1 of WebAssembly will very likely be what the current draft describes. The specifics of this article are for version mvp-13.

There are some existing compilers which are getting “WASM’d”, but this article is going to focus on creating WASM programs without lots of dependencies or high-level languages.

Let’s get started.

Anatomy of a WebAssembly program

Actually, it’s called a “module” since with WebAssembly there’s no difference between a “program” and a “library” — there are only “modules”, each which can tie in and communicate with other modules, and each which can have a “main” function.

First off a module expresses which version of WebAssembly it uses for its encoding. What then follows are a variable number of sections, each containing information about the module. A module always begins with standard sections where order is important, and optionally ending in any number of custom sections that can contain any kind of data, all ignored by standard WASM virtual machines.

The standard sections in WASM version 1 are as follows, where sections required for any functional module is marked by an asterisk (*):

  1. Type* — Function signature declarations
  2. Import — Import declarations
  3. Function* — Function declarations
  4. Table — Indirect function table and other tables
  5. Memory — Memory attributes
  6. Global — Global declarations
  7. Export — Exports
  8. Start — Start function declaration
  9. Element — Elements section
  10. Code* — Function bodies
  11. Data — Data segments

The type section contains a list of each unique function signature used throughout the module. This includes any signatures of imported functions. The position in the list is the type signature’s unique index within the module. E.g.

(i32 i32 -> i32)  // func_type #0
(i64 -> i64)      // func_type #1
( -> )            // func_type #2

WebAssembly has only four concrete types: 32-bit integer, 64-bit integer, 32-bit floating-point number and 64-bit floating-point number where the integer types are sign-agnostic (more on this later) and floating-point numbers following the IEEE 754-2008 standard. Any complex types can be built on top of these basic types by a compiler. The rest of this article, as well as the WebAssembly documentation, will refer to these basic types with the short names i32, i64, f32, and f64 respectively.

The import section declares any external dependencies by listing module name, field name and type for each function, value or data required:

("dumb-math", "quadruple", (func_type 1))        // func #0
("dumb-math", "pi", (global_type i64 immutable))

It’s up to the host system (e.g. web browser) to resolve these imports and this is how runtime-dynamic linking is made possible in WASM, and is also the foreign-function interface for interacting with non-WASM functions, as we will see later.

The function section declares indexes for each function that is later defined in the code section, where the position in the list is the function’s index and the value its type. The effective function index starts at number of func_type imports, meaning that the effective list of functions available in the module is the import section list filtered on function imports joined by the function-section list.

(func_type 1)  // func #1
(func_type 1)  // func #2
(func_type 0)  // func #3

When we later call functions, we will refer to them by their index (“func #N” above, where N is the function’s index.)

The table section defines any number of tables. Tables are mechanisms for mapping opaque values which can not be represented in nor directly accessed by WebAssembly, like for instance a JavaScript object or an operating-system file handle. This feature bridges the gap between low-level, untrusted linear memory and high-level opaque handles/references at the cost of a bounds-checked table indirection. We won’t go into more detail about tables.

The memory section defines the optional memory of the module by defining its initial size and optionally how large it is expected to expand. The data section can be used to initialize the memory.

The global section declares any number of mutable or immutable global variables for the module. These are equivalent to static variables in C and C++.

The export section declares any parts of the module that can be accessed by the host environment, not including the special start function, which we will get to in a just a few sentences.

("half" (func 1))

The above would export function 1 as “half”. If we look at the function section above, we can see that func #1’s type is func_type #1, so the host environment will see this as:

function half(arg0 :int64) :int64

In addition to functions, modules can also export tables, memory segments and global variables to the outside world.

The start section designates a function index for a function to be called when the module is loading and is the mechanism that can be used to make a module an executable program, or used to dynamically initialize globals or memory of a module.

The element section allows a module to initialize the contents of a table imported from the outside or defined in the table section.

The code section is probably the bulk of most WebAssembly modules as it defines all code for all functions in the module. The function bodies are defined in the same order as their respective function indexes in the function section, not including imports of course. Our “half” function’s body could be defined like this:

get_local 0  // push parameter #0 on stack (our dividend)
i64.const 2  // push constant int64 "2" on stack (our divisor)
i64.div_u    // unsigned division; pushes result onto stack
end          // ends function, resulting in one i64 (top of stack)

You probably already guessed it: WebAssembly is a stack machine which means that we push and pop values onto and off from a virtual stack. Most WASM operator codes (“opcode” for short) takes and/or adds a well-defined amount of values to the stack.

Let’s walk through to see what happens to the stack if we run the code of our “half” function:

A few things to note about this diagram:

  1. The stack is abstract in the sense that we can’t index into it, measure it or inspect it; we can only push to and pop from it. This leaves the door open to efficient VM implementations.
  2. The 4th step above is probably never an actual state in a real VM as there would be no reason to actually empty the stack; the step illustrates what is theoretically happening to the stack.
  3. At the end the function implicitly returns and the caller will have one value (the result) available on the stack.

There are quite a lot of number operators defined by WASM 1 — too many to list here. However, you might find it comforting knowing that in addition to all basic operations you’d expect, like div and add, there are several other useful common numeric operations defined as opcodes rather than being left to be implemented on top of other, simpler instructions. For example:

There is also a versatile repertoire of conversion operators as well, for converting between any two types with well-defined behavior.

Last but not least is the data section which can be used to initialize imported or local memory:

(data_segment
  0                          // linear memory index
  (init_expr (i32.const 4))  // byte offset at which to place the data
  (data 0x2a 0x0 0x0 0x0))

This would initialize bytes [4–8] in the 0th memory with the provided byte values, which if read as an unsigned i32 yields the number “42”.

Managing data and values

We’ve looked at the WebAssembly value stack for saving and reading basic values. There are a few more ways to manage data and state in your modules.

Locals are like variables in many higher-level programming languages: we can name them, store values to them and load values from them. Locals are named using function-local integer index.

When defining a function body, we can declare any number of locals for the function’s code. Parameters to a function are also locals and the “name” index of locals start at 0 with the first function parameter and continue incrementing with each additional parameter and eventually with each listed local of the function body. Locals are always initialized to zero (all their bits are 0.)

In our example earlier in this article we used the operation get_local to push the value of the 1st function parameter onto the stack. We can also perform the reverse: popping a value off of the stack and storing it into a local.

i32.const 123  // for the purpose of demonstration, push "123" to the stack
set_local 0    // pop "123" off of the stack and store it into local #0
// use the stack for other operations...
get_local 0    // "123" is pushed to the top of the stack

Locals makes it possible to hold on to temporary values while performing other operations that would otherwise replace or move a value off the stack.

Globals are a form of module-wide locals with their own indexes and operators, but otherwise shares the semantics of locals. The global index starts with any imported globals and continues incrementing with any globals defined in the global section. We can load and store global values using the get_global and set_global operations.

Memory is the most flexible as it allows us to store any data we can imagine, but trades ease-of use for flexibility:

Memory being “linear” means that there’s no random allocator operators available — all memory addresses used in a module’s code are expressed in terms of byte offsets from the beginning of a memory segment. WASM being a low-level format, this makes a lot of sense. It’s up to the higher-level language that targets WASM to provide memory management on top of this linear memory space, if needed.

Elements are opaque “value handles” that mean something to the host environment, but are opaque to WebAssembly. For example an element might represent an operating-system file handle or a JavaScript object if the host environment runs JavaScript.

Elements are cool, but memory is a lot more interesting, so let’s get back to learning about memory and what it enables us to do.

Future versions of WASM might allow multiple memory segments to be used, but version 1 only uses a single segment per module, which simplifies a lot of things, like memory operators:

The following code stores an i32 “123” at byte range [4..7]:

i32.const 4    // byte-sized address
i32.const 123  // value to store
i32.store 2 0  // 2 = 32-bit/4-byte alignment, 0 = offset

We can load the value back onto the stack at a later time:

i32.const 4
i32.load 2 0   // after this, the top of the stack is i32 "123"

The flexibility of memory is not only in the fact that we can load things that were stored in the past—we can do that with locals too—but that memory is persistent for the duration of a module’s lifetime, meaning we can access values across functions, interpret the same data as different types, and store byte-sized values in a much more compact manner than locals, globals and the stack uses.

Imagine that we want to call a function with the name of our favorite buddy, “Lolcat”. Encoding the name as UTF-8 text, each character only needs 1 byte (so 6 bytes in total plus some additional sentinel or length value). A naïve and very inflexible way of doing this would be to push each character onto the stack and have the callee accept some predetermined number of parameters, e.g. function (c0, c1, c2, c3, c4, c5 i32) :void:

i32.const 0x4c // 'L'
i32.const 0x6f // 'o'
i32.const 0x6c // 'l'
i32.const 0x63 // 'c'
i32.const 0x61 // 'a'
i32.const 0x74 // 't'
call 1         // call a function that takes exactly 6 parameters

Not only is this ridiculously inflexible, but it requires a lot of stack space — each “card on the stack” is one machine word large. On a 32-bit system (or 32-bit virtual machine), that means each letter of our “Lolcat” string occupies 4 bytes, bringing the total to 24 bytes. Another thing to consider is that the stack, although abstract, has a limit; if we push too many values to the stack without popping some off, the host machine will panic and crash our module.

This is a task for memory. Instead let’s imagine the callee function to take just a single argument that is the memory address of the string, e.g. function (addr i32) :void. First, when we assemble out module we initialize a part of module’s memory with the “Lolcat” string:

(data_segment 0
  (init_expr (i32.const 4))  // byte offset at which to place the data
  (data 0x4c 0x6f 0x6c 0x63 0x61 0x74 0x0))

Notice how in this example we place an additional “sentinel” zero byte at the end of the string. This is common in languages like C but generally discouraged. If you’re writing a real program you probably want to prefix the string with its length instead, but the “sentinel byte” will do for our example.

We can now call the function with the address (4) of the string:

i32.const 4  // byte offset into memory, "passed" as an argument to...
call 1       // call the receiving function

The receiving function would then load whatever parts of the string it needs from memory. For instance, say that it calls an imported “putc” function (which puts a byte to the programs stdout stream):

void print_str(i32 addr) {
  i32 byte = 0;
  while (true) {
    byte = i32.load8_u(addr);
    if (byte == 0) { break; }
    putc(byte);
    addr = addr + 1;
    continue;
  }
}

In WASM we have declared the need for one additional local (local #1; the byte variable) in our function and can now implement the equivalent functionality:

block void          // declares a "label" at it's "end"
  loop void         // declares a "label" right here
    // byte = i32.load8_u(addr)
    get_local 0     // push addr
    i32.load8_u 0 0 // push the byte at addr as an i32
    tee_local 1     // store the byte to local 1, but don't pop it

    // if (byte == 0) { break }
    i32.eqz         // (x i32) => i32(x == 0 ? 1 : 0)
    br_if 1         // if the byte was zero, jump to end of "block"

    // putc(byte)
    get_local 1     // push byte
    call 0          // call imported "putc" function with the byte

    // addr = addr + 1
    get_local 0     // push addr
    i32.const 1     // push i32 "1"
    i32.add         // push result from addr + 1
    set_local 0     // store new address to "addr" local

    // continue
    br 0            // jump to "loop" (i.e. continue looping)
  end
end // end of "block"

Addressing memory

You may have noticed that we provide two immediate values to the load and store operators. The first immediate value is an alignment “hint” that is a power-of 2 encoded as log2(alignment). In practice this means that the alignment immediate is one of the following values: 0 = 8-bit, 1 = 16-bit, 2 = 32-bit, and 3 = 64-bit.

If the effective address of a memory access is a multiple of the alignment attribute value of the memory access, the memory access is considered aligned, otherwise it is considered misaligned. Aligned and misaligned accesses have the same behavior.1

The alignment hint is like a promise to the virtual machine executing our code that “the effective address is going to be aligned at N bits”, information which the VM will use to optimize our code. When we promise a certain alignment but fail to keep that promise by providing an address that is misaligned, the operation will likely be much slower than if we had provided an aligned effective address.

Therefore as a rule of thumb, you should only promise what you can keep — hint at the largest possible alignment for any operation, but no larger than the native alignment (32-bit for wasm32, 64-bit for wasm64.)

The second immediate for load and store operators is the address offset. This has proven to be a point of confusion already, so let’s clarify what the offset really is, what the effective address is and what happens to memory as we store and load values.

Here’s an illustration of the beginning of linear memory, initialized to zero:

The effective address is the offset in bytes measured from the beginning of the memory. The effective address is the sum of the address operand and the offset immediate.

effective-address = address-operand + offset-immediate

You might ask “why would I ever need offset if I have an address operand”? And it’s a fair question; as we’ve seen in previous examples, we can provide a constant value (e.g. using i32.const) for the address operand. However when we start using dynamic addresses, like we did in our “print_str” function earlier, it can be very useful for a compiler to add a constant offset to all memory operations of a module’s functions in order to “relocate” one area of memory to another.

WASM provides a rich set of memory operators for each of the four basic types, allowing reading some number of bytes as some kind of number. For instance, we can store a 64-bit integer in two bytes by truncating the value to a 16-bit integer:

i32.const 3      // address-operand = 3
i64.const 1234   // value
i64.store16 1 3  // alignment = 1 = 16-bit, offset-immediate = 3
                 // effective-address = 3 + 3 = 6

Since we hint at a 16-bit alignment and the effective address is an even multiple of 2 (2 bytes = 16 bits), this is an optimized (aligned) store. Our memory now looks like this, with the segment stored to is marked with dark background color:

WebAssembly always uses little-endian encoding for values in memory, no matter what the host platform’s native “endianess” is. This means that the 16-bit integer “1234” is encoded as 0xd2 0x04.

If we had stored a 32-bit or 64-bit value instead, the store would have been misaligned and inefficient since effective-address “6” is in the middle of a 32-bit and 64-bit word:

i64.store32 2 3  // align = 2 = 32-bit, offset = 3  =>  addr = 6

The red arrows indicate that for the store to be aligned as promised (at 32-bit boundaries), we’d have to either decrease or increase our address by 2 (i.e. 6 ± 2).

A load or store is aligned when the effective address is at the boundary of the promised alignment magnitude.

Let’s adjust the offset-immediate to get a 32-bit aligned store:

i64.store32 2 5  // align = 2 = 32-bit, offset = 5  =>  addr = 8

Optimal, aligned store.

Converting between and reinterpreting values

We are free to interpret any bytes as any value. For instance we can load the four bytes representing a 16-bit integer that we stored above as a 32-bit floating-point number:

i32.const 4   // address-operand = 4
f32.load 2 4  // align = 2 = 32-bit, offset = 4  =>  addr = 8

The top of the stack now contains a f32 representing roughly the number “1.7292e-42”, which is possibly not what you would have expected. Remember, we didn’t convert a number from one type to another, we just interpreted the raw data used to represent one type as another type.

If our intention was to load the i32 value as an equivalent f32 number, we instead use one of the numeric conversion operators for the destination type:

i32.const 4       // address-operand = 4
i32.load 2 4      // align = 2 = 32-bit, offset = 4  =>  addr = 8
f32.convert_u_i32

The top of the stack is now an f32 with the value “1234.0”.

Floating-point numbers are always “signed”, but integers can be unsigned. “But… we’ve been working with i32’s all article long and I haven’t seen any sign-specific operators!?” Most of the operators we’ve been using so far only needs to know the size of the data, and doesn’t care if the value is meant to be interpreted as being signed or not. WebAssembly integer values are fundamentally sign-agnostic.

There are of course a few operations which requires knowing whether its operands are signed or unsigned. For instance, the “less than” operator needs to know if it’s comparing potentially negative numbers.

Operators that are sign-dependent has a _s or _u suffix. Here are a few:

i32.lt_s    //   signed-i32  <    signed-i32
i32.lt_u    // unsigned-i32  <  unsigned-i32
i32.div_s   //   signed-i32  /    signed-i32
...

JavaScript API

Note: The following section is out of date. Please refer to the official documention.

As “WebAssembly” hints with “Web” in its name, the first useful WebAssembly platforms are the cutting-edge versions of the most popular web browsers: Chrome, Firefox, Safari and Edge. At the time of writing this article, WebAssembly needs to be enabled through the browser’s advanced settings. In Chrome the setting is found under “flags” (chrome://flags/#enable-webassembly).

I’m not going to talk much about this since there’s plenty of good information on the JavaScript browser API. Here’s an example of loading a module, “instantiating” it (running it) and finally calling one of its exported functions from JavaScript:

WebAssembly.compile(new Uint8Array(
  `00 61 73 6d  0d 00 00 00  01 09 02 60  00 00 60 01
7f 01 7f 03  03 02 00 01  05 04 01 00  80 01 07 07
01 03 66 6f  6f 00 01 08  01 00 0a 3a  02 02 00 0b
35 01 01 7f  20 00 41 04  6c 21 01 03  40 01 01 01
0b 03 7f 41  01 0b 20 01  41 e4 00 6c  41 cd 02 20
01 1b 21 01  41 00 20 01  36 02 00 41  00 21 01 41
00 28 02 00  0f 0b 0b 0e  01 00 41 00  0b 08 00 00
00 00 2c 00  00 00`.split(/[\s\r\n]+/g).map(v => parseInt(v, 16))
)).then(mod => {
  let m = new WebAssembly.Instance(mod)
  console.log('foo(1) =>', m.exports.foo(1))
  console.log('foo(2) =>', m.exports.foo(2))
  console.log('foo(3) =>', m.exports.foo(3))
})

If we paste this into a WebAssembly-enabled browser’s console, we should see the result of some simple computations done by the module:

foo(1) => 400
foo(2) => 800
foo(3) => 1200

This hopefully gives you a hint of how WASM can inter-operate with JavaScript to actually be practical and useful as an incremental step toward a WebAssembly-powered app. Perhaps you’ll write your UI in React & JavaScript, but write something like a JPEG encoder or file-format parser in WASM, having it all play nicely together.

wasm-util

I’ve put together a small collection of TypeScript/JavaScript code for working with WASM dubbed wasm-util. This code has no dependencies and is structured in a way that makes it possible to use only a few select source files for tasks that only require some functionality.

See wasm-util readme for a complete listing of functionality.

I found myself relying on a very complicated tool chain (source build of llvm, binaryen, etc) while all I was looking for was to get close to WebAssembly. The prime component of wasm-util is ast which provides a convenient way of building complete WASM modules, with full static type-checking if you’re using TypeScript.

Following is an example of building a module that provides the factorial function. Let’s start by describing the function we’re making in a C-like syntax:

int64 factorial(int64 n) {
  return (n == 0) ? 1
                  : n * factorial(n-1);
}

The equivalent WebAssembly code looks like this:

get_local 0    // push parameter #0 on stack.
i64.const 0    // push constant int64 "0" on stack.
i64.eq         // execute "eq" which pops two operands from stack
               //  and pushes int32 "1" or "0" on stack.
if i64         // pops one int32 from stack; if its not "0":
  i64.const 1  //   push constant int64 "1" on stack.
else           // else (if operand was "0"):
  get_local 0  //   push parameter #0 on stack. [1]
  get_local 0  //   push parameter #0 on stack.
  i64.const 1  //   push constant int64 "1" on stack.
  i64.sub      //   execute "sub[tract]" which pops two operands
               //    from stack (parameter #0 and constant "1")
               //    and finally pushes the result int64 on stack.
  call 0       //   call function #0 ("factorial") which pops one
               //    int64 from the stack and when it returns an
               //    int64 has been pushed on stack
  i64.mul      //   execute "sub[tract]" which pops two operands
               //    from stack ([1] and result from function call)
               //    and finally pushes the resulting int64 on stack
end            // ends function, returning one int64 result (on stack.)
               // Stack now contains one int64 value that's the result from one of
               // the two branches above.

The above code was printed by lbtext, for which we provided an AST built with the ast module:

import ... 'wasm-util/ast'
const mod = c.module([

  type_section([
    func_type([i64], i64), // type index = 0
  ]),

  function_section([
    varuint32(0), // function index = 0, using type index 0
  ]),

  export_section([
    // exports "factorial" as function at index 0
    export_entry(
      str_ascii("factorial"),
      external_kind.function,
      varuint32(0)),
  ]),

  code_section([
    // body of function at index 0:
    function_body([ /* additional local variables here */ ], [
      if_(i64, // i64 = result type of `if` expression
        i64.eq(get_local(i64, 0), i64.const(0)), // condition
        [ // then
          i64.const(1)
        ], [ // else
          i64.mul(
            get_local(i64, 0),
            call(i64, varuint32(0), [ // 0 = function index
              i64.sub(get_local(i64, 0), i64.const(1))
            ]))])])]
  )]
)

We can now generate WASM bytecode through the Emittable interface:

const emitbuf = new BufferedEmitter(new ArrayBuffer(mod.z))
mod.emit(emitbuf)
// the array buffer (emitbuf.buffer) now contains the complete module code

Or print a human-readable representation of the AST:

import { strRepr } from 'wasm-util/repr'
console.log(strRepr(mod))

Which yields the following in the console:

(module 13
  (section type 6 1
    (func_type (i64) i64))
  (section function 2 1 0)
  (section export 13 1
    (export_entry "factorial" external_kind.function 0))
  (section code 25 1
    (function_body 23 0
      (if [i64]
        (i64.eq
          (get_local [0])
          (i64.const [0])
        )
        (then
          (i64.const [1]))
        (else
          (i64.mul
            (get_local [0])
            (call [0]
              (i64.sub
                (get_local [0])
                (i64.const [1])
              )))) end) end)))

A complete version of this “factorial” demo can be found as a unit test in the wasm-util repository.

With wasm-util you can get started playing around with building your own WebAssembly programs right now with a minimum amount of effort. Perhaps you’re the author of a compiled language and are interested in targeting WASM — this might be a good way to get started.


———

If it’s not already clear from this article: I believe in a bright future for WebAssembly. The web is an amazing concept that is currently falling short carrying a heavy legacy from the past. In order to bring truly great experiences to the web and blur the lines of what is and what isn’t a “native” experience, WebAssembly—or something else like it—needs to happen.

At Figma we’re building one of the most complex and performance-critical apps there is on the web platform today, and we are very excited to start moving from ASM.js to WebAssembly. Does this stuff excite you? We’re hiring :)

WASM as Intermediate Representation

As I was writing this article it slowly dawned on me that WebAssembly might be an excellent IR (Intermediate Representation) for compilers of all sorts.

An intermediate representation is a representation of a program part way between the source and target languages. A good IR is one that is fairly independent of the source and target languages, so that it maximizes its ability to be used in a retargetable compiler.2

  1. WebAssembly really has no dependencies on anything “web” and its module and bytecode format is at such a low level that it can be used to represent any other higher-level program.
  2. There are several high-quality VMs being implemented, many of them in web browser, which can execute WASM programs efficiently, making distribution a bliss (the web!)
  3. WASM code can be translated to machine code.
  4. WASM can be interpreted and emulated which makes debugging a whole lot more pleasant than debugging programs which only run natively. How about we stop the world and interactively inspect the stack? How about we play back an interpreted recording that reproduces a bug deterministically?

Want to help out? Join the community and come hang in #webassembly on irc.w3c.org