Fault-tolerant actors

Spin up 50 million actors on one machine and message them without a single lock. When one crashes, its supervisor restarts it while the rest keep running. When a whole subsystem goes sideways, the failure propagates upward exactly as far as you told it to, and no further.

This is the programming model that runs WhatsApp, Discord's voice chat, and every BEAM system ever shipped — MetaScript brings it in as a first-class language feature (not a library, not a framework) and compiles it to JS/C/WASM today, with Erlang on the roadmap for native BEAM interop.

Isolated state

An actor owns its memory. No other actor can touch it. No shared heaps, no locks, no data races by construction.

Message passing

Calling a method on an actor is sending a message. Messages are processed one at a time, so state updates are race-free without a single mutex.

Supervision trees

Actors die. Supervisors restart them. Crashes become local events, not program-ending ones. The discipline is let it crash — contain failure, don't prevent it.

The rest of this guide walks through these primitives in MetaScript: write an actor, hook lifecycle events, build a supervision tree, coordinate lifetimes with links and monitors.

Your first actor

An actor is declared with the actor keyword. Methods are messages; calling one returns a Promise:

actor Counter {
    private n: number = 0;

    bump(): number {
        this.n = this.n + 1;
        return this.n;
    }

    get(): number {
        return this.n;
    }
}

const c = new Counter();
c.bump();                      // fire-and-forget: returns Promise<number>, caller ignores
const v = await c.bump();      // await if you want the return value
console.log(`count = ${v}`);   // count = 2

Key rules:

State is private. No other actor can touch this.n. No shared memory, no locks.
Calls are messages. c.bump() sends a message to c's mailbox. The call returns a Promise<T>; await blocks the caller (not the actor scheduler) until the actor processes it.
Methods run one at a time. An actor processes one message at a time, so state updates inside methods are race-free.

Lifecycle hooks

Actors can implement optional methods the runtime calls for you:

import { ExitReason, setIdleTimeout } from "std/actor";

actor CacheEntry {
    private value: string;

    constructor(value: string) {
        this.value = value;
        setIdleTimeout(this.pid, 30000);  // 30s idle → stop
    }

    read(): string { return this.value; }

    onIdle(): void {
        // No messages in 30s — clean up
        console.log("idle timeout, exiting");
    }

    onTerminate(reason: ExitReason): void {
        // Last chance to flush, close, notify — called once before death
    }

    onExit(childPid: int64, reason: ExitReason): void {
        // A linked actor died. Supervisors use this.
    }
}

this.pid is the actor's own process id (int64) — pass it around to wire up links, monitors, and the name registry.

Supervision

A supervisor is itself an actor from std/actor/supervisor that owns children, links to them, and restarts them according to a policy. This is what makes a tree of actors "fault-tolerant" — a crash in a leaf becomes a restart, not a process exit.

import {
    Supervisor, RestartStrategy, RestartType, ShutdownKind, ChildSpec,
} from "std/actor/supervisor";

function startCounter(): int64 {
    const c = new Counter();
    return c.pid;
}

const sup = new Supervisor(
    RestartStrategy.OneForOne,
    /*maxRestarts*/ 3,
    /*maxSeconds*/ 5,
);

const spec: ChildSpec = {
    name: "counter",
    start: startCounter,
    restart: RestartType.Permanent,   // always restart
    shutdown: ShutdownKind.Timeout,
    shutdownMs: 5000,
};
sup.addChild(spec);
sup.start();

Restart strategies

Strategy	On one child crash
`OneForOne`	Restart that child only
`OneForAll`	Restart all children
`RestForOne`	Restart the crashed child and every child started after it

Restart types

Type	When it restarts
`Permanent`	Always — any exit reason
`Transient`	Only on abnormal exit (crash)
`Temporary`	Never

Shutdown kinds

Kind	Behavior
`BrutalKill`	Immediate `stop(pid, Killed)` — skips `onTerminate`
`Timeout`	Polite `stop(pid, Normal)`, wait `shutdownMs`, then force-kill if still alive
`Infinity`	Polite stop, wait forever (use for nested supervisors)

The supervisor also enforces a sliding-window restart tolerance: more than maxRestarts crashes inside maxSeconds → the supervisor itself gives up and exits, propagating the failure upward. This is the "let it crash" discipline: retry locally, but surface systemic failure to the layer above.

Links, monitors, trap exits

Three primitives for coordinating lifetime between actors, from std/actor:

import {
    link, unlink, monitor, demonitor, trapExits, ExitReason,
} from "std/actor";

link(a, b) — bidirectional. If either dies abnormally, the other gets an exit signal. Used by supervisors.
monitor(watcher, target) — unidirectional. Watcher receives a DOWN message when target dies. No effect on watcher's lifetime. Use when you care about a target's death but don't want to die with it.
trapExits(pid, true) — convert exit signals into regular messages via onExit. Supervisors trap exits so a dying child doesn't take them down.

Name registry

import { registerName, whereis, unregister, stop, ExitReason } from "std/actor";

const db = new DatabaseConnection();
registerName(db.pid, "db");

// Somewhere else, without a reference to `db`:
const pid = whereis("db");
if (pid !== 0) {
    // send a message to the registered actor
}

// On shutdown:
stop(db.pid, ExitReason.Normal);
// Name auto-unregisters on actor death.

Constraints: 1:1 mapping (one name per pid, one pid per name). Names auto-unregister when the actor dies — no dangling pids.

Scale

The actor runtime is built for many cheap actors, not a few heavy ones. The benchmark at examples/benchActor50M.ms creates 50 million no-op actors in a loop, sends each one a message, and drains the last one — use it as a reference to measure cost on your own hardware (msc run examples/benchActor50M.ms --release).

Cross-backend notes

actor, the supervisor, and all the primitives above are language-level, not runtime-specific. The same source compiles to MetaScript's JS, C, and WASM backends. An Erlang backend — which would map actors to BEAM processes and give you native distribution — is on the roadmap but not shipped yet; treat "distributed supervision across nodes" as future work for now.

Next steps

Compile targets — which backend to pick
Memory model — how ORC handles short-lived actors
MetaScript on GitHub — source, issues, roadmap