Dibash Thapa

How to use strings in WebAssembly ?

I was working on compiling a small subset of javascript to webassembly. Representing numbers, if-else conditions and for loops was intuitive and straightforward. However, because I didn’t have any prior experience working with low-level instructions, I was stuck with using strings in webassembly.

Compiling a hello world program in webassembly was not easy as I was thinking.

Memory Layout

After researching for a bit, I found that WebAssembly has a data section where we can initialize data. So, I could push "hello world" to the data section and print it by its offset from memory.

(data (i32.const 0) "Hello World")

Unlike high-level languages where strings are first-class citizens, WebAssembly treats them as raw bytes in memory. It follows a linear memory model, so like an array, we can store data in contiguous memory. And this is how hello world is stored in memory.

You can understand more about linear memory from this this blog by Lin Clark.

Storing the Strings

For printing and storing the string, we need two offsets:

We need to track these offsets in our compiler. So, if we want to track two different strings "hello" and "world" in different lines of the program:

1| var a = "hello"
....
18| var b = "world"

We can track the offsets by simply incrementing our last offset by the length of the string.

new_offset = last_offset + length of the string

For example:

new_offset = 0 + "hello".length = 0 + 5 = 5

In WebAssembly, this is translated to:

(data (i32.const 0) "hello")
(data (i32.const 5) "world")

Okay, but how do we print this now? We only know the offsets of the string — how about the length?

Length of the string

Then I found this blog, where the author was storing the length of the string in the first byte. This technique is called Length Prefixed Encoding, which is common in network protocols and even in programming languages like Pascal.

String: "\05hello"
Memory: [0x05]['h']['e']['l']['l']['o']
         ↑______↑
         length  data

So if my string is "hello", then the string with length will be stored as \05hello — the first byte is the length of the string.

\05 will be stored as 0x05.

I defined this function to get the length of the string:

(func $len (param $addr i32) (result i32)
          local.get $addr
          call $nullthrow
          i32.load8_u
)

End Notes

But we’re not done yet. In my next post, we’ll tackle the real challenge: dynamic strings that can grow, shrink, and be modified at runtime.

We’ll build our own memory allocator.