Observability

dvb-WarpPool exposes its runtime state in two complementary ways:

  1. Pull — Prometheus-compatible /metrics endpoint
  2. Push — Notifier sinks (see Notifications) for operator-aware events

A good setup uses both: Prometheus scrapes every 15s for trends and alerts, and critical events (block found, RPC down) go out immediately as notifications.

/metrics Endpoint

Path: GET /metrics on the regular API port (default 18334). Format: Prometheus text exposition text/plain; version=0.0.4.

Authentication: none — the endpoint is read-only and contains no secrets. If your pool network is public and you don't like that, put a reverse proxy with basic auth in front of it.

Base counters (always present)

MetricTypeDescription
warppool_blocks_found_totalcounterAccepted blocks since the first daemon start
warppool_shares_accepted_totalcounterAccepted shares across all workers
warppool_shares_rejected_totalcounterStale / low-diff / malformed
warppool_workers_totalgaugeNumber of workers ever seen
warppool_rpc_readygauge1 if Bitcoin Core RPC is reachable
warppool_rpc_ibdgauge1 if Bitcoin Core is in initial block download
warppool_network_heightgaugeChain tip height according to our node
warppool_network_difficultygaugeCurrent network difficulty
warppool_current_job_heightgaugeHeight of the template currently being served
warppool_current_job_coinbase_value_satsgaugeCoinbase reward in sats
warppool_started_at_secondsgaugeDaemon start as a unix timestamp
warppool_last_template_at_secondsgaugeLast successful getblocktemplate
warppool_build_info{brand,profile,chain}gaugeConstant 1, all constants in labels

Phase 16: extended pool metrics

These are active as soon as the daemon hands PoolMetrics to the API state (automatic when the daemon binary is running; optional in test setups).

MetricTypeDescription
warppool_workers_authorized_totalcounterCumulative mining.authorize successes (v1) + OpenChannel successes (v2)
warppool_workers_disconnected_totalcounterCumulative authenticated worker disconnects
warppool_active_connections{protocol="v1"}gaugeOpen Stratum V1 connections
warppool_active_connections{protocol="v2"}gaugeOpen Stratum V2 connections
warppool_bitcoin_rpc_latency_secondshistogramEnd-to-end RPC call duration (all retries included)

Histogram buckets (seconds): 0.001, 0.005, 0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, +Inf. Prometheus cumulative semantics — each observation increments every bucket ≥ its value.

Example query (Grafana):

# p99 RPC latency, last 5 minutes
histogram_quantile(0.99, rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m]))

# RPC call rate
rate(warppool_bitcoin_rpc_latency_seconds_count[1m])

Phase 22: per-miner vendor probe metrics

When the daemon's miner_poll_loop is running (default), configured miners are polled every 30s and their telemetry values are exposed as gauges:

MetricTypeLabelsDescription
warppool_miner_hashrate_ghsgaugelabel, host, vendor, modelMiner-reported hashrate in GH/s
warppool_miner_temperature_cgaugelabel, host, vendor, modelASIC core temperature in °C
warppool_miner_power_wgaugelabel, host, vendor, modelPower draw in watts
warppool_miner_voltage_mvgaugelabel, host, vendor, modelASIC core voltage in mV
warppool_miner_fan_rpmgaugelabel, host, vendor, modelFan speed in RPM
warppool_miner_last_probe_age_secondsgaugelabel, host, vendorSeconds since the last successful probe
warppool_miner_probe_healthgaugelabel, host, vendor1 if OK and recent (<5min); 0 if error or stale

None fields are skipped — if a miner doesn't report voltage_mv, for example, the metric is simply omitted for that miner (instead of 0, which would wreck the operator's trend lines).

If WARPPOOL_AUTO_PROBE_DISCOVERED=true is set, miners discovered via mDNS are also included with label="discovered" — the operator can separate them with:

# Configured miners only
warppool_miner_hashrate_ghs{label!="discovered"}

# Discovered miners (not in the DB)
warppool_miner_hashrate_ghs{label="discovered"}

Example queries:

# Total pool hashrate (sum of all miners)
sum(warppool_miner_hashrate_ghs)

# Maximum temperature across all miners — operator alarm if > 85°C
max(warppool_miner_temperature_c)

# Hashrate per worker per watt (efficiency)
warppool_miner_hashrate_ghs / warppool_miner_power_w

# Which miners have a failing probe cycle?
warppool_miner_probe_health == 0

Phase 15/16: notifier metrics

When a notifier is configured:

MetricTypeDescription
warppool_notifier_sinks_activegaugeNumber of initialized sinks
warppool_notifier_events_sent_total{sink,event,result}counterSend attempts per (sink, event kind, outcome)

result = "ok" or "err". event is one of block-found, miner-disconnect, rpc-down, rpc-recovered, test.

Example query: sink failure rate (a hint at wrong env vars or blocked webhooks):

rate(warppool_notifier_events_sent_total{result="err"}[5m])
  / ignoring(result) group_left
  rate(warppool_notifier_events_sent_total[5m])

Grafana Dashboard

A starter dashboard with the most important panels:

{
  "title": "dvb-WarpPool",
  "panels": [
    {
      "title": "Blocks Found",
      "type": "stat",
      "targets": [{ "expr": "warppool_blocks_found_total" }]
    },
    {
      "title": "Hashrate (approx, last 5min)",
      "type": "timeseries",
      "targets": [{
        "expr": "rate(warppool_shares_accepted_total[5m]) * pow(2, 32)"
      }]
    },
    {
      "title": "Active Connections",
      "type": "timeseries",
      "targets": [
        { "expr": "warppool_active_connections{protocol=\"v1\"}", "legendFormat": "v1" },
        { "expr": "warppool_active_connections{protocol=\"v2\"}", "legendFormat": "v2" }
      ]
    },
    {
      "title": "RPC Latency (p50/p99)",
      "type": "timeseries",
      "targets": [
        { "expr": "histogram_quantile(0.50, rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m]))", "legendFormat": "p50" },
        { "expr": "histogram_quantile(0.99, rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m]))", "legendFormat": "p99" }
      ]
    },
    {
      "title": "Bitcoin Core Health",
      "type": "stat",
      "targets": [
        { "expr": "warppool_rpc_ready" },
        { "expr": "warppool_rpc_ibd" }
      ]
    }
  ]
}

(A full dashboard with variables and annotations may follow later as packaging/grafana/dashboard.json.)

Prometheus scrape config

scrape_configs:
  - job_name: dvb-warppool
    scrape_interval: 15s
    static_configs:
      - targets: ['pool.local:18334']

Alert recipes

RPC unreachable > 2min

- alert: WarppoolRpcDown
  expr: warppool_rpc_ready == 0
  for: 2m
  annotations:
    summary: "Pool {{ $labels.instance }} has no RPC connection to the Bitcoin node"

No shares > 10min (miner offline?)

- alert: WarppoolNoShares
  expr: rate(warppool_shares_accepted_total[10m]) == 0
  for: 10m
  annotations:
    summary: "Pool {{ $labels.instance }} is receiving no shares"

RPC latency p99 > 1s

- alert: WarppoolRpcSlow
  expr: |
    histogram_quantile(0.99,
      rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m])
    ) > 1
  for: 5m

Notifier sink failing persistently

- alert: WarppoolNotifierBroken
  expr: |
    rate(warppool_notifier_events_sent_total{result="err"}[15m])
      / ignoring(result) group_left
      rate(warppool_notifier_events_sent_total[15m]) > 0.5
  for: 15m
  annotations:
    summary: "Notifier sink {{ $labels.sink }} failing >50% — check config"

SSE Events (separate story)

Alongside /metrics, /api/events runs a Server-Sent-Events stream that pushes live events to the UI (block_found, new_job, shares_accepted, ...). It's primarily intended for the UI banners; for monitoring, use /metrics — Prometheus is more robust against scraping pauses.

See also