Skip to content

Observability

Numax can expose a small HTTP observability endpoint with:

  • /health
  • /ready
  • /metrics

The metrics endpoint uses Prometheus text format and is enough to run a basic Prometheus + Grafana stack without adding anything to the runtime.

Start Numax With Metrics

Run any module with the observability endpoint enabled:

Terminal window
nx run target/wasm32-unknown-unknown/release/distributed_counter.wasm \
--observability-listen 127.0.0.1:9100 \
--datastore-path /tmp/numax-observability-demo \
--settle-for 5s

Check the endpoint directly:

Terminal window
curl http://127.0.0.1:9100/health
curl http://127.0.0.1:9100/ready
curl http://127.0.0.1:9100/metrics

You can also run the lightweight verification script:

Terminal window
docs/scripts/check-observability.sh http://127.0.0.1:9100

Run Prometheus And Grafana

Start the ready-made stack:

Terminal window
docker compose -f docs/compose/observability.yml up

Open:

  • Prometheus: http://localhost:9090
  • Grafana: http://localhost:3000

Grafana credentials:

  • user: admin
  • password: admin

The Numax dashboard is provisioned automatically from docs/dashboards/numax.json.

Metrics

The dashboard uses only metrics currently emitted by Numax:

MetricTypeMeaning
numax_ops_totalcounteroperations processed by the runtime
numax_peers_connectedgaugecurrently connected peers
numax_sync_latency_msgaugelast recorded sync latency in milliseconds
numax_sync_errors_totalcountersync-related errors
numax_observability_requests_totalcounterobservability endpoint requests
numax_observability_errors_totalcounterobservability endpoint errors
numax_peer_connects_totalcounterpeer connection events
numax_peer_disconnects_totalcounterpeer disconnection events
numax_broadcast_batches_totalcounterbroadcast batches sent
numax_broadcast_ops_totalcounteroperations broadcast to peers
numax_store_keysgaugekey count in the local store
numax_store_bytesgaugeapproximate store payload bytes

Useful PromQL

Operations throughput:

rate(numax_ops_total[1m])

Broadcast throughput:

rate(numax_broadcast_ops_total[1m])

Recent sync errors:

increase(numax_sync_errors_total[5m])

Connected peers:

numax_peers_connected

Store growth:

numax_store_bytes

Alert Examples

Target down:

up{job="numax"} == 0

No connected peers:

numax_peers_connected == 0

Sync errors observed:

increase(numax_sync_errors_total[5m]) > 0

Observability endpoint errors observed:

increase(numax_observability_errors_total[5m]) > 0

Store size above 1 GiB:

numax_store_bytes > 1073741824