Roadmap
Note on the mutability of this roadmap:
This roadmap can change, even significantly, based on:
- community feedback (issues, discussions, real-world usage),
- technical constraints that emerge during implementation,
- external dependencies (Wasmtime, sled, the WASM/WASI ecosystem, Component Model standard),
- new ideas, opportunities, or simply how one wakes up in the morning with a better intuition.
Proposing changes: anyone can open a Pull Request against this roadmap page to:
- suggest a new item in a future version,
- move an item between versions with motivation,
- flag a risk or dependency that justifies a change in priority,
- propose a completely new version.
Roadmap PRs are as welcome as code PRs !
Status and goal
- Current release line:
v0.1.0(active - includes documentation, distribution & configuration) - Final goal of the cycle: stable
v0.2.0. - Philosophy of intermediate releases: every
0.1.xis a stable and usable release. Capabilities are added incrementally without sacrificing quality.
Unlike v0.1.0 (declared for non-critical workloads), v0.2.0 must guarantee:
- dynamic peer discovery without manual configuration,
- a reactive event model, no longer just
run()one-shot, - granular capability-based security with per-module quotas,
- complete operability (snapshot, restore, replay, diff, hot reload),
- wire and schema versioning with documented compatibility,
- hardened supply chain (signatures, SBOM, continuous fuzzing),
- complete observability (metrics, dashboard, TUI).
Version map
| Version | Theme | Status |
|---|---|---|
v0.1.0 | First production-ready + Documentation, Distribution & Configuration | active |
v0.1.1 | Architectural Cleanup & Versioning | planned |
v0.1.2 | Performance & Profiling | planned |
v0.1.3 | Supply Chain & Fuzzing | planned |
v0.1.4 | Management API | planned |
v0.1.5 | Peer Discovery - Foundations | planned |
v0.1.6 | Peer Discovery - SWIM & Gossip K-fanout | planned |
v0.1.7 | Reactive Module Model - Events | planned |
v0.1.8 | Reactive Module Model - HTTP & Hot Reload | planned |
v0.1.9 | Capability-Based Security | planned |
v0.1.10 | Resource Quotas & Multi-tenant | planned |
v0.1.11 | Op-log Compaction & Snapshots | planned |
v0.1.12 | Operability Tools | planned |
v0.1.13 | Built-in Dashboard | planned |
v0.1.14 | TUI & Advanced CRDTs | planned |
v0.1.15 | WIT & Component Model | planned |
v0.2.0-rc.1 | Release Candidate hardening | planned |
v0.2.0 | Stable - production-ready, any criticality | final goal |
Legend: active = current release line; planned = future work; final goal = end of the cycle.
v0.1.0 - First production-ready + Documentation
Goal: initial release of the runtime + Deployment & Docs. Add the configuration file as the first alternative to the CLI.
Documentation:
- Dedicated documentation site scaffold in
docs/nx-site(Astro Starlight) - Host API available in the documentation site
- Roadmap available in the documentation site
- Whitepaper available in the documentation site
- Versioned docs (
latest,v0.1, futurev0.2) - Full Sitemap
Distribution:
- Precompiled binaries (Linux x86_64, Linux ARM64, macOS Intel, macOS Apple Silicon, Windows)
- Automated release from tag on GitHub
- SHA256 checksum for each asset
Configuration (alternative to the CLI):
- Support for
numax.tomlwith sections:[network],[tls],[storage],[observability],[limits],[discovery] - Environment variables with
NX_prefix (e.g.NX_LISTEN,NX_DATASTORE_PATH) - Explicit precedence: CLI flags > env > file > default
-
nx config init- generates a commentednumax.toml -
nx config validate- validates a file without running it -
nx config show --effective- shows the resulting effective configuration
Ready-made observability:
-
docs/dashboards/numax.json- official Grafana dashboard -
docs/compose/observability.yml- preconfigured Prometheus + Grafana - PromQL alert examples in the docs site
-
docs/scripts/check-observability.sh- lightweight endpoint smoke check
Closing criterion:
A new user, starting from zero, can deploy 3 nodes with mTLS in under 10 minutes by following the documentation, without asking anyone.
v0.1.1 - Architectural Cleanup & Versioning ๐งน
Goal: clean up the architectural debt of 0.1.0 and introduce wire/schema versioning.
Split of sync_manager.rs (I created a monster):
- Turn the single file into a
sync_manager/module with various sub-modules -
OpAppliertrait for each CRDT family - E2E tests split per CRDT family
- Constraint: the split PR is only move + re-export, zero logic changes
Wire Protocol Versioning:
- Explicit
protocol_versionfield inHello - Documented compatibility matrix: N vs N-1, N vs N+1
- E2E test with two binaries of different versions in the same cluster
- Document
docs/design/wire-versioning.md
Schema Persistence Versioning:
- Magic number + version in every sled โtableโ
- Explicit migrations version N โ N+1
- โRolling upgradeโ test: old node + new node in the same cluster
- CLI
nx migratefor offline migration
Typed Error Frames:
- Enum
WireError:ProtocolMismatch,OpRejected,RateLimited,NotAuthorized,Internal - Documented retry vs fatal semantics
- Backoff differentiated by error type
Closing criterion:
sync_manager.rsno longer exists as a single file. A cluster with a0.1.1node and a0.1.0node refuses the connection with a clear, versioned error, not with a crash.
v0.1.2 - Performance & Profiling ๐
Goal: make performance observation automatic and visible, prevent silent regressions.
Profiling tools:
-
tokio-consoleintegration (visibility into tasks) - CPU flamegraph in CI with
pprof-rsorsamply - Heap profiling with
dhatintegrated into benchmarks - Per-WASM-module profiling (CPU time, bytes allocated)
Regression gate:
- Load benchmarks extended with automatic JSON report
- CI workflow that compares with baseline and fails if p99 latency, throughput or RSS regress > X%
- Baseline history committed in
crates/*/reports/baselines/
Additional metrics:
-
numax_module_cpu_msper module -
numax_module_memory_bytesper module -
numax_op_apply_duration_msdistribution
Closing criterion:
A PR that worsens sync p99 by > 5% is automatically blocked by CI with the regression details.
v0.1.3 - Supply Chain & Fuzzing ๐
Goal: make numax adoptable by those with strict supply-chain policies.
Supply chain:
-
cargo-denyin CI (licenses, advisories, dup deps, banned crates) -
cargo-auditscheduled (daily workflow) - CycloneDX SBOM generated for every release
- Releases signed with Sigstore / cosign
- GitHub workflows with minimal
permissions: - Action SHA pinning (no
@v3but@<sha>)
Fuzzing:
-
cargo-fuzzon wire parsers (Hello,PushOps,PullSince, framing) - Proptest extended to all CRDTs (LWW-Map, RGA, ORSet)
- OSS-Fuzz integration
- Seed corpus committed in
fuzz/corpus/
Sled hardening:
- Test: sled file corruption โ recovery from op-log
- Test: full disk โ graceful degrade
Closing criterion:
24 hours of fuzzing on all targets without panic. Verifiable SBOM. Releases verifiable with
cosign verify.
v0.1.4 - Management API ๐
Goal: provide a programmatic alternative to the CLI for integration with automation tooling.
REST API /api/v1/*:
- Served on a separate port (default
127.0.0.1:9102) - Auth with bearer token (never open without)
- Default: bind only to
127.0.0.1, external exposure must be explicit - OpenAPI 3.1 spec in
docs/api/openapi.yaml
v1 endpoints:
-
GET/POST /api/v1/modules- module management -
GET /api/v1/peers- list connected peers -
POST /api/v1/peers- manually add a peer -
GET /api/v1/keys?prefix=...- list keys -
GET /api/v1/keys/{key}- read a value -
GET /api/v1/health,GET /api/v1/ready(aliases of existing observability endpoints) -
POST /api/v1/snapshot- trigger snapshot
Internal pattern:
- Single source of truth:
RuntimeIntrospectiontrait used by CLI, REST API, dashboard, TUI
Closing criterion:
A numax node can be managed exclusively via REST API, without ever invoking the CLI. A working Terraform provider example exists in
examples/terraform-provider/.
v0.1.5 - Peer Discovery: Foundations ๐
Goal: stop requiring --peer 1.2.3.4:9000 for every node. Introduce the abstraction and mechanical bootstrap (not yet gossip-based; that comes in 0.1.6).
Abstraction:
-
PeerDiscoverytrait withdiscover(),announce(),watch()methods - Internal replacement of
--peerwith aStaticDiscoveryimplementing the trait
Initial implementations:
-
StaticDiscovery- peer list from config (backward-compatible) -
BootstrapGossipDiscovery- join with 1 address, learn the others throughHelloexchange -
MdnsDiscovery- LAN discovery for demo and dev -
DnsSrvDiscovery- discovery via DNS-SRV record -
FileWatchDiscovery- peer file updated externally (useful for K8s headless services)
Configuration:
-
[discovery]section innumax.tomlwithmode = "static" | "bootstrap" | "mdns" | "dns-srv" | "file"
Explicit decision:
- Document
nat-traversal.md- NAT/WAN traversal to be evaluated for0.2.0.
Closing criterion:
Three nodes on the same LAN discover each other via mDNS without any
--peerflag. Reproducible demo inexamples/discovery_lan/.
v0.1.6 - Peer Discovery: SWIM & Gossip K-fanout ๐ธ
Goal: dynamic discovery, with membership, failure detection and dissemination separated. This is the strength of 0.2.0.
Design doc as a public RFC:
-
peer-discovery.md - Documented failure scenarios
- Detailed test plan
Three separate channels:
- Membership: SWIM / Lifeguard (who is in the cluster)
- Failure detection: phi-accrual or SWIM-style suspicion (who is dead/suspect)
- Data dissemination: K-fanout gossip for CRDT ops (what to propagate)
Adaptive K-fanout gossip:
- Configurable fanout (default
K = ceil(log2(N) + c)) - Adaptive rate based on load/RTT
- Backpressure: controlled drops, never storms
- Periodic anti-entropy complementing gossip
Determinism for tests:
- Seedable gossip PRNG for reproducible tests
Test scenarios:
- 50 nodes, 10% packet loss, partition recovery
- Cluster split-brain โ merge without op loss
- 100% rolling restart of nodes โ cluster survives
- False positive detection rate measured
Closing criterion:
A 50-node cluster on a simulated network with 10% packet loss converges in < 30s after a 60s partition. No false-positive failure detection in nominal conditions for 1h.
v0.1.7 - Reactive Module Model: Events โก
Goal: modules become long-running and reactive.
Design doc:
-
docs/design/event-model.mdas RFC
Module lifecycle:
- Long-running module with event loop
-
init()called at startup -
shutdown()called on graceful shutdown - Backward-compatible
run()one-shot mode (so existing examples donโt break)
Registerable callbacks:
-
on_remote_op(key, op_kind)- CRDT op applied by a peer -
on_tick(ms)- periodic timer -
on_peer_connected(node_id)/on_peer_disconnected(node_id) -
on_message(topic, payload)- explicit intra-cluster messages
Guest SDK:
- Macro
nx_sdk::on_remote_op!for ergonomic registration - Example
examples/reactive_dashboard/- module that updates in real time
Closing criterion:
A reactive module receives an op from a peer, runs custom logic (e.g. sends a notification), and the example is documented step-by-step.
v0.1.8 - Reactive Module Model: HTTP & Hot Reload ๐
Goal: modules can serve HTTP and be reloaded without dropping peer connections.
HTTP handler:
-
on_request(req) -> responseas a callback - Explicit
network.servecapability (deny-by-default) - Minimal internal routing (path โ handler)
Hot reload:
-
nx reload <module>- replaces the module without closing peer connections - CRDT state preserved during reload
- Test: reload under load, zero ops lost
Killer demo:
-
examples/collaborative_todo/- local-first multi-device todo list, web UI served by the module, real-time CRDT sync. Filmable for the launch.
Closing criterion:
The โcollaborative todo listโ demo runs on 3 devices, the user edits a todo, the other devices see it in < 500ms. Hot reload in production with no state loss.
v0.1.9 - Capability-Based Security ๐
Goal: the current โenabled/disabledโ model is replaced by granular per-module capabilities.
Per-module policy file:
-
module.policy.tomlnext to the.wasm - Sections:
[capabilities],[quotas] - Granular capabilities for keyspace, CRDT family, network, time, random
Example:
[capabilities]db.read = ["inventory:*", "settings:*"]db.write = ["inventory:*"]crdt.gcounter = ["visits:*"]crdt.rga = []net.peers = falsenetwork.serve = truerandom = truetime = true- Policy signed with the same key as the cert (anti-tampering)
- Deny-by-default: capability not listed = denied
- Enforcement at the host-call level
- Audit log of host calls (optional, opt-in)
CLI/API:
-
nx policy validate <policy.toml> -
nx policy diff <old> <new>
Closing criterion:
A module without a policy does not start. A module with a minimal policy cannot access keys outside its namespace. Dedicated security tests.
v0.1.10 - Resource Quotas & Multi-tenant ๐ฆ
Goal: numax becomes multi-tenant safe: multiple modules on the same node, isolated, with resource quotas.
Resource quotas:
-
cpu_ms_per_run- max CPU time per invocation -
memory_max_mb- max module memory -
ops_per_sec- CRDT op rate limit -
bytes_written_per_sec- sled write rate limit - Enforcement with module interruption + log + metric
- Quota usage metrics in Prometheus
Multi-module per node:
- Internal module supervisor
-
nx run <mod1> <mod2> ...or config file with module list - Op routing based on key prefix per module
- Keyspace isolation (ties in with capabilities)
- A module crash does not bring the node down
Closing criterion:
10 modules on the same node, each with different quotas, none can affect the others. โMalicious moduleโ test attempting to exhaust resources โ contained correctly.
v0.1.11 - Op-log Compaction & Snapshots ๐
Goal: the op-log does not grow indefinitely. Backup and restore exist.
Op-log compaction:
- Periodic CRDT state snapshot
- Op-log truncation up to the snapshot point
- Persisted dedup-set consistent with truncation
- Differentiated approach per CRDT family (some support causal compaction, others require full snapshot)
-
docs/design/compaction.md
Snapshot/Restore:
-
nx snapshot create- atomic datastore snapshot -
nx snapshot list -
nx snapshot restore <id> - Exportable snapshot (single file, portable across nodes)
- Test: new node joining using a peerโs snapshot
Storage:
-
KvBackendabstraction to decouple from sled (preparation for a possible switch to redb/fjall)
Closing criterion:
A cluster operating for 7 days with active compaction keeps the op-log within a configured budget. Restore from snapshot in < 60s for a 10GB datastore.
v0.1.12 - Operability Tools ๐
Goal: when something goes wrong, you need the tools to figure it out.
Replay & diff:
-
nx replay <op-log> <new-datastore>- applies an exported op-log to an empty datastore -
nx diff <datastore-a> <datastore-b>- compares two datastores and shows divergences -
nx inspect <key>- structured CRDT dump for a key
Optional determinism:
-
--deterministicmode that disables uncontrolled random/time - Replay perfectly reproducible in deterministic mode
- Document
docs/design/determinism.md
Opt-in telemetry:
- Active opt-in, default off
- Collected data: version, OS, arch, average peer count, CRDT families used
- Explicit document on what is collected and why
- Self-hosted collection endpoint
Closing criterion:
A real divergence case (even simulated) is diagnosed in < 15 minutes using only the official tools.
v0.1.13 - Built-in Dashboard ๐จ
Goal: a native, lightweight web dashboard, focused on the 6 views that matter.
Stack:
- Server-side rendering + HTMX + SSE (no React/Vue/heavy bundles)
- Theme using a free design system (Pico.css or Tailwind+DaisyUI)
- Compile-time feature flag
--features dashboard(base binary stays small)
The 6 views:
- Cluster view: nodes, status (alive/suspect/dead), RTT latency, topology
- CRDT explorer: list of keys, CRDT family, current value, last modification, author
- Op flow: live stream of incoming/outgoing ops (filterable by key/peer/family)
- Convergence health: per-node vector clock, highlights lag and suspected divergences
- Throughput/latency: ops/sec, p50/p95/p99, error rate
- Module info: active modules, host call counts, consumed quotas
Security:
- Served on a separate port (default
127.0.0.1:9101) - Default bind to
127.0.0.1 - Basic auth + token (never open without)
- Read-only by default; mutations require an elevated token
Reuse:
- The dashboard is a consumer of the same
RuntimeIntrospectionas the Management API
Closing criterion:
The โconvergence health viewโ diagnoses a simulated divergence in 1 click. Screenshots ready for the public launch.
v0.1.14 - TUI & Advanced CRDTs ๐ฅ
Goal: those who live in SSH have their version. Those who need collaborative text editing have it too.
TUI nx top:
- Implemented with
ratatui - Reuses the Management API
/api/v1(same 6 views as the dashboard) - Local connection (default) or remote with token
- Hotkeys k9s/lazygit-style
Advanced CRDTs (integration, not reimplementation):
- Evaluation and integration of Yrs (Rust port of Yjs) as an optional backend for efficient text sequences
- Evaluation of Automerge for nested JSON CRDT
- Document
docs/design/advanced-crdts.mdwith tradeoffs - Example
examples/collaborative_editor/- replicated text editor
User-defined CRDT (kick-off, not completion):
- Document
docs/design/user-defined-crdts.mdwith interface proposal - Prototype behind an experimental feature flag
- Required mathematical guarantees documented (commutativity, associativity, idempotency)
Closing criterion:
nx topis usable for production debugging via SSH. Working collaborative editor demo with Yrs.
v0.1.15 - WIT & Component Model ๐งฉ
Goal: the host API ABI becomes standard, stable, multi-language via the WebAssembly Component Model.
Gradualist approach:
- Step 1: write the
.witdescribing the current Host API (specification only, no migration) - Step 2: automatically generate the guest SDK bindings from
.witwithwit-bindgen - Step 3: port the runtime to
wasmtime::component::Linkerbehind feature flag--features components - Step 4: legacy ABI maintained in parallel, deprecated in
0.3.0
WASI Preview 2:
- Optional evaluation and integration (capability-based filesystem/clock/random/sockets)
- Naturally ties in with the capability-based security from
0.1.9
Multi-language:
- Guest example in Go (TinyGo)
- Guest example in JavaScript (ComponentizeJS)
- Guest example in Python (componentize-py)
Closing criterion:
The same
.witis used by the Rust SDK, by a Go guest, by a JS guest, and they all converge on the same shared CRDT.
v0.2.0-rc.1 - Release Candidate Hardening
Goal: everything built in 0.1.0โ0.1.15 is put under stress, integrated, documented and finished.
Integrated hardening:
- Combined stress test: discovery + capability + quotas + compaction + reload under load
- Extended chaos test: unstable network, restart loop, malicious module, partition recovery
- 7-day soak test on a real cluster (not simulated)
- Internal security audit completed
Final documentation:
- Migration guide
0.1.x โ 0.2.0 - Updated production deployment guide
- All design docs revised and linked from the docs site
RC criteria:
-
cargo test --workspacepasses -
cargo clippy --workspace --all-targets -- -D warningspasses - 24h fuzzing green on all targets
- Regression gate green
- All tutorials verified end-to-end
v0.2.0 - Stable
Final goal: distributed runtime production-ready for any criticality.
Final release criteria:
- All
0.1.0โ0.1.15releases closed - Complete and reviewed documentation
-
0.3.xroadmap opened as RFC
Beyond v0.2.0 - candidate directions for 0.3.x
Nothing promised. These are candidate themes that may enter
0.3.xor later, based on feedback and priorities.
- NAT traversal and WAN gossip (STUN, relay, possibly libp2p)
- User-defined CRDTs complete and production-ready
- Legacy ABI deprecated: Component Model only
- Federated clusters: clusters of clusters, with cross-cluster replication policies
- Pluggable storage backends: redb, fjall, custom
- GPU/ML guests: WASI-NN integration
- Edge orchestration: optional integration with existing edge runtimes
- Tiny embedded runtimes: evaluate interpreter-based WASM engines such as
wasmior WAMR for Cortex-M / RISC-V devices with RAM measured in kilobytes. Wasmtime is the right native engine for the current runtime, but it is not a microcontroller-class target.
How to contribute to the roadmap
- Open an issue with the
roadmap-proposallabel if you want to discuss before writing code or a document. - Open a PR against this roadmap page if you want to propose directly:
- a new item in a future version,
- moving an item between versions,
- a new intermediate version,
- a change to a closing criterion.
- Roadmap PRs are treated as code PRs: review, discussion, merge.
Last revision: 2026-05-30