How V8’s Liftoff compiler broke our WASM execution

Every once in a while, you run into a bug that that sends you down a rabbit hole, questioning everything you thought you knew about how your system works. This is the story of one such bug - a performance anomaly that took us deep into the internals of Node.js worker threads, WebAssembly execution, and V8’s compiler optimizations.

At Attio, we have developed a custom JavaScript runtime that allows us to run untrusted third-party code in a secure way. The runtime is based on QuickJS and runs entirely in WebAssembly (WASM). On every execution of third-party code, we spin up a WASM module, run the code inside it, and then discard the module. This helps to ensure safe isolation between different executions.

While deciding on the best way to deploy this service, we experimented with using Node.js worker threads. However, once we deployed the thread based solution into production, we found extremely surprising results.

The experiment

Our experiment was simple - we would start several worker threads, each worker would spawn our custom runtime context, and run a simple for loop. If all went well, we expected to see performance benefits based on the available parallelism on the machine.

if (isMainThread) {
  const parallelWorkers = 4
  
  // Spawn multiple workers
  for (let i = 0; i < parallelWorkers; i++) {
    const worker = new Worker(__filename, {
      workerData: { workerId: i },
    })

    worker.on("message", (message) => {
      console.log(
        `Worker ${message.workerId} completed in ${message.duration.toFixed(2)}ms`
      )
    })
  }
} else {
  async function doWork() {
    const start = performance.now()

    const context = await createJSContext()

    // Simulate a compute-heavy task in the JS context
    context.exec(`
      for (let i = 0; i < 1e7; i++) {
        // Simulated workload
      }
    `)

    // Notify the main thread about completion time
    parentPort.postMessage({
      workerId: workerData.workerId,
      duration: performance.now() - start,
    })
  }

  doWork()
}

But when we ran the script, the results were surprising:

Running a single worker took a quite reasonable 330ms.
Running four workers made each worker’s individual execution time balloon to 4053ms.
Cutting the worker count to two reduced execution time to 1200ms per worker.

These numbers didn’t add up. Instead of parallel execution, we saw dramatic slowdowns. In fact, this slowdown was more severe than if we had simply executed the functions in series. This presented us with a problem: If we couldn’t efficiently parallelize execution, our entire approach to scaling the runtime would be in question.

Debugging the slowdown

Our first instinct was that we were not utilizing thread parallelism correctly with worker threads. We experimented with the UV_THREADPOOL_SIZE environment variable and examined process metrics, but everything suggested that thread parallelism was not the issue.

We started digging deeper to determine what was causing this unexpected behavior. We modified the worker code, removing all logic related to creating a JavaScript context, replacing it with a for loop. This was the simplest way to check if the issue was with our JS runtime implementation or a general Node.js problem. Running this script yielded expected results - the execution time remained consistent between single and multiple workers. Only when the number of workers exceeded the available parallelism on the machine did we see performance degradation, which was exactly what we would expect.

That was a breakthrough. It allowed us to narrow down the problem to either our runtime or V8's WebAssembly implementation.

To eliminate our runtime as a suspect, we created a simple WASM program that exported a function running a similar for loop. We ran the experiment again and found our culprit: performance degraded as more workers were added. The issue wasn’t with our runtime, it was combination of V8’s WebAssembly engine and Node.js worker threads.

V8’s Liftoff compiler

While running tests, we noted another interesting pattern. Sometimes, one or two workers would finish significantly earlier than the others. Considering WebAssembly is ostensibly a compiled language, we did not expect such unpredictable performance characteristics. This led us to dive into V8’s documentation and blog posts to better understand how the WASM engine worked.

During our research, we stumbled upon Liftoff, V8’s single-pass WASM compiler, which is enabled by default. According to V8’s blog:

"The goal of Liftoff is to reduce startup time for WebAssembly-based apps by generating code as fast as possible. Code quality is secondary, as hot code is eventually recompiled with TurboFan anyway. Liftoff avoids the time and memory overhead of constructing an IR and generates machine code in a single pass over the bytecode of a WebAssembly function."

Liftoff is optimized for fast compilation, not execution speed. This suggests a possible explanation, that Liftoff could be responsible for the unpredictable performance we were observing.

Disabling Liftoff

To test this theory, we ran a new experiment. We disabled Liftoff using a V8 flag in Node.js. The results were immediate - execution times stayed consistent across worker threads. As a final step, we reintroduced our custom runtime to confirm that everything had been resolved. With Liftoff disabled, everything worked smoothly.

Key takeaways

This experience reinforced an important lesson that runtime optimizations can have unexpected side effects and always need to be verified. If you’re running WASM inside Node.js worker threads and seeing unexplained performance issues, consider testing with Liftoff disabled.

More broadly, it’s a reminder that when debugging complex performance problems, the key is to systematically isolate variables - often, the real culprit is hiding where you least expect it.

Interested in redefining one of the world’s most important software categories? Check out our careers page.

The experiment

Debugging the slowdown

V8’s Liftoff compiler

Disabling Liftoff

Key takeaways

Ready to redefine the world’s largest software category?