Fault-tolerant actors
Spin up 50 million actors on one machine and message them without a single lock. When one crashes, its supervisor restarts it while the rest keep running. When a whole subsystem goes sideways, the failure propagates upward exactly as far as you told it to, and no further.
This is the programming model that runs WhatsApp, Discord's voice chat, and every BEAM system ever shipped — MetaScript brings it in as a first-class language feature (not a library, not a framework) and compiles it to JS/C/WASM today, with Erlang on the roadmap for native BEAM interop.
Isolated state
An actor owns its memory. No other actor can touch it. No shared
heaps, no locks, no data races by construction.
Message passing
Calling a method on an actor is sending a message. Messages are
processed one at a time, so state updates are race-free without a
single mutex.
Supervision trees
Actors die. Supervisors restart them. Crashes become local events, not program-ending ones. The discipline is let it crash — contain failure, don't prevent it.
The rest of this guide walks through these primitives in MetaScript: write an actor, hook lifecycle events, build a supervision tree, coordinate lifetimes with links and monitors.
Your first actor
An actor is declared with the actor keyword. Methods are messages;
calling one returns a Promise:
actor Counter {
private n: number = 0;
bump(): number {
this.n = this.n + 1;
return this.n;
}
get(): number {
return this.n;
}
}
const c = new Counter();
c.bump(); // fire-and-forget: returns Promise<number>, caller ignores
const v = await c.bump(); // await if you want the return value
console.log(`count = ${v}`); // count = 2Key rules:
- State is private. No other actor can touch
this.n. No shared memory, no locks. - Calls are messages.
c.bump()sends a message toc's mailbox. The call returns aPromise<T>;awaitblocks the caller (not the actor scheduler) until the actor processes it. - Methods run one at a time. An actor processes one message at a time, so state updates inside methods are race-free.
Lifecycle hooks
Actors can implement optional methods the runtime calls for you:
import { ExitReason, setIdleTimeout } from "std/actor";
actor CacheEntry {
private value: string;
constructor(value: string) {
this.value = value;
setIdleTimeout(this.pid, 30000); // 30s idle → stop
}
read(): string { return this.value; }
onIdle(): void {
// No messages in 30s — clean up
console.log("idle timeout, exiting");
}
onTerminate(reason: ExitReason): void {
// Last chance to flush, close, notify — called once before death
}
onExit(childPid: int64, reason: ExitReason): void {
// A linked actor died. Supervisors use this.
}
}this.pid is the actor's own process id (int64) — pass it around
to wire up links, monitors, and the name registry.
Supervision
A supervisor is itself an actor from std/actor/supervisor that owns
children, links to them, and restarts them according to a policy. This
is what makes a tree of actors "fault-tolerant" — a crash in a leaf
becomes a restart, not a process exit.
import {
Supervisor, RestartStrategy, RestartType, ShutdownKind, ChildSpec,
} from "std/actor/supervisor";
function startCounter(): int64 {
const c = new Counter();
return c.pid;
}
const sup = new Supervisor(
RestartStrategy.OneForOne,
/*maxRestarts*/ 3,
/*maxSeconds*/ 5,
);
const spec: ChildSpec = {
name: "counter",
start: startCounter,
restart: RestartType.Permanent, // always restart
shutdown: ShutdownKind.Timeout,
shutdownMs: 5000,
};
sup.addChild(spec);
sup.start();Restart strategies
| Strategy | On one child crash |
|---|---|
OneForOne | Restart that child only |
OneForAll | Restart all children |
RestForOne | Restart the crashed child and every child started after it |
Restart types
| Type | When it restarts |
|---|---|
Permanent | Always — any exit reason |
Transient | Only on abnormal exit (crash) |
Temporary | Never |
Shutdown kinds
| Kind | Behavior |
|---|---|
BrutalKill | Immediate stop(pid, Killed) — skips onTerminate |
Timeout | Polite stop(pid, Normal), wait shutdownMs, then force-kill if still alive |
Infinity | Polite stop, wait forever (use for nested supervisors) |
The supervisor also enforces a sliding-window restart tolerance:
more than maxRestarts crashes inside maxSeconds → the supervisor
itself gives up and exits, propagating the failure upward. This is the
"let it crash" discipline: retry locally, but surface systemic failure
to the layer above.
Links, monitors, trap exits
Three primitives for coordinating lifetime between actors, from
std/actor:
import {
link, unlink, monitor, demonitor, trapExits, ExitReason,
} from "std/actor";link(a, b)— bidirectional. If either dies abnormally, the other gets an exit signal. Used by supervisors.monitor(watcher, target)— unidirectional. Watcher receives aDOWNmessage when target dies. No effect on watcher's lifetime. Use when you care about a target's death but don't want to die with it.trapExits(pid, true)— convert exit signals into regular messages viaonExit. Supervisors trap exits so a dying child doesn't take them down.
Name registry
Register top-level services by name so you don't have to pass pids around:
import { registerName, whereis, unregister, stop, ExitReason } from "std/actor";
const db = new DatabaseConnection();
registerName(db.pid, "db");
// Somewhere else, without a reference to `db`:
const pid = whereis("db");
if (pid !== 0) {
// send a message to the registered actor
}
// On shutdown:
stop(db.pid, ExitReason.Normal);
// Name auto-unregisters on actor death.Constraints: 1:1 mapping (one name per pid, one pid per name). Names auto-unregister when the actor dies — no dangling pids.
Scale
The actor runtime is built for many cheap actors, not a few heavy
ones. The benchmark at
examples/benchActor50M.ms
creates 50 million no-op actors in a loop, sends each one a message,
and drains the last one — use it as a reference to measure cost on
your own hardware (msc run examples/benchActor50M.ms --release).
Cross-backend notes
actor, the supervisor, and all the primitives above are
language-level, not runtime-specific. The same source compiles to
MetaScript's JS, C, and WASM backends. An Erlang backend — which
would map actors to BEAM processes and give you native distribution —
is on the roadmap but not shipped yet; treat "distributed supervision
across nodes" as future work for now.
Next steps
- Compile targets — which backend to pick
- Memory model — how ORC handles short-lived actors
- MetaScript on GitHub — source, issues, roadmap