Distributed Computation with WASM and WASI

By Bailey Hayes

Elevator Pitch

Distributed computation is a necessary paradigm for modern analytics. This talk proposes an extension to WASI that enables portable, host-, and language-independent ecosystems of composable WebAssembly modules to support distributed algorithms.

Description

Distributed computation is a necessary paradigm for modern analytics. This talk proposes an extension to WASI that enables portable, host-, and language-independent ecosystems of composable WebAssembly modules to support distributed algorithms. This builds on active proposals for WASM and WASI-like interface types for handles and records, shared-nothing linking, and others to create declarative, strongly-typed, and sandboxed WASM modules for runtimes to distribute computation across systems.

Notes

Spark Data Sources enable a variety of different languages, including Scala, Python, Java, SQL, and R. Today we have a spark-connector that supports robust SQL pushdown; that is, applicable operations in Spark get translated to true SQL and executed, with high performance and excellent concurrency, on SingleStore.

A data source can be the input to Spark for analytics work; store output from Spark after data enrichment; or do both at the same time.

Show diagram of all of the systems often at play

Show how if we pushed this into a distributed database, moving compute next to the data, we can not only simplify but improve performance, security, safety, resiliency, and devx

The main abstraction Spark provides is a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel. 

Spark to persist an RDD in memory, allowing it to be reused efficiently across parallel operations. Finally, RDDs automatically recover from node failures.

Resiliency is a key feature of distributed computation

So if we create an API for resilient and distributable data frames, then we can propose an extension to WASI that enables portable, host- and language-independent ecosystem of composable WebAssembly modules to support MapReduce and other distributed algorithms.

Goals:

Enable AOT compilation

Portable

Host and language-independent

Composable WASM modules

Highly performant distributed computation

If we want to get super technical on how all of this is possible, then strap-in for a whirlwind tour of the proposals that are in flight.

The module-linking spec enables a portable, host- and language-independent ecosystem of composable WebAssembly modules.

A shared-nothing architecture partitions a whole application into multiple isolated units that encapsulate their mutable state; shared mutable state is either banned or significantly restricted. When multiple languages are used, then, it’s natural to put separate languages into separate isolated units. In raw WebAssembly terms, a natural shared-nothing unit is a module which only imports and exports functions and immutable values, but not memories, tables or mutable globals.

Interface-types allow a wide variety of value representations, avoiding the need for intermediate (de)serialization steps. Ideally, this will even include lazily-generated data structures (e.g., via iterators, generators or comprehensions).

To create a maximally-reusable module, a developer would produce an adapter module that exclusively uses interface types in its signature to create a shared-nothing interface. The value semantics of interface types provide the client language significant flexibility in how to coerce to and from its native language values. This means opportunities for highly performant, zero-copy translations between WASM modules and host runtime.

lifting and lowering between interface types and opaque reference types, allowing zero-copy when used on both sides

In order to realize wasi-data, we need to:

  • design the WASI API
  • provide a backing implementation (in this case, by porting a map reduce algorithm to Rust)
  • implement the WITX specification with the backing implementation
  • expose the implementation in the runtime
  • optionally, provide bindings to compile programs to the specification (in this case, from Rust to wasi-data)

Live DEMO