Observability

dvb-WarpPool exposes its runtime state in two complementary ways:

Pull — Prometheus-compatible /metrics endpoint
Push — Notifier sinks (see Notifications) for operator-aware events

A good setup uses both: Prometheus scrapes every 15s for trends and alerts, and critical events (block found, RPC down) go out immediately as notifications.

/metrics Endpoint

Path: GET /metrics on the regular API port (default 18334). Format: Prometheus text exposition text/plain; version=0.0.4.

Authentication: none — the endpoint is read-only and contains no secrets. If your pool network is public and you don't like that, put a reverse proxy with basic auth in front of it.

Base counters (always present)

Metric	Type	Description
`warppool_blocks_found_total`	counter	Accepted blocks since the first daemon start
`warppool_shares_accepted_total`	counter	Accepted shares across all workers
`warppool_shares_rejected_total`	counter	Stale / low-diff / malformed
`warppool_workers_total`	gauge	Number of workers ever seen
`warppool_rpc_ready`	gauge	1 if Bitcoin Core RPC is reachable
`warppool_rpc_ibd`	gauge	1 if Bitcoin Core is in initial block download
`warppool_network_height`	gauge	Chain tip height according to our node
`warppool_network_difficulty`	gauge	Current network difficulty
`warppool_current_job_height`	gauge	Height of the template currently being served
`warppool_current_job_coinbase_value_sats`	gauge	Coinbase reward in sats
`warppool_started_at_seconds`	gauge	Daemon start as a unix timestamp
`warppool_last_template_at_seconds`	gauge	Last successful `getblocktemplate`
`warppool_build_info{brand,profile,chain}`	gauge	Constant 1, all constants in labels

Phase 16: extended pool metrics

These are active as soon as the daemon hands PoolMetrics to the API state (automatic when the daemon binary is running; optional in test setups).

Metric	Type	Description
`warppool_workers_authorized_total`	counter	Cumulative `mining.authorize` successes (v1) + `OpenChannel` successes (v2)
`warppool_workers_disconnected_total`	counter	Cumulative authenticated worker disconnects
`warppool_active_connections{protocol="v1"}`	gauge	Open Stratum V1 connections
`warppool_active_connections{protocol="v2"}`	gauge	Open Stratum V2 connections
`warppool_bitcoin_rpc_latency_seconds`	histogram	End-to-end RPC call duration (all retries included)

Histogram buckets (seconds): 0.001, 0.005, 0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, +Inf. Prometheus cumulative semantics — each observation increments every bucket ≥ its value.

Example query (Grafana):

# p99 RPC latency, last 5 minutes
histogram_quantile(0.99, rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m]))

# RPC call rate
rate(warppool_bitcoin_rpc_latency_seconds_count[1m])

Phase 22: per-miner vendor probe metrics

When the daemon's miner_poll_loop is running (default), configured miners are polled every 30s and their telemetry values are exposed as gauges:

Metric	Type	Labels	Description
`warppool_miner_hashrate_ghs`	gauge	label, host, vendor, model	Miner-reported hashrate in GH/s
`warppool_miner_temperature_c`	gauge	label, host, vendor, model	ASIC core temperature in °C
`warppool_miner_power_w`	gauge	label, host, vendor, model	Power draw in watts
`warppool_miner_voltage_mv`	gauge	label, host, vendor, model	ASIC core voltage in mV
`warppool_miner_fan_rpm`	gauge	label, host, vendor, model	Fan speed in RPM
`warppool_miner_last_probe_age_seconds`	gauge	label, host, vendor	Seconds since the last successful probe
`warppool_miner_probe_health`	gauge	label, host, vendor	1 if OK and recent (<5min); 0 if error or stale

None fields are skipped — if a miner doesn't report voltage_mv, for example, the metric is simply omitted for that miner (instead of 0, which would wreck the operator's trend lines).

If WARPPOOL_AUTO_PROBE_DISCOVERED=true is set, miners discovered via mDNS are also included with label="discovered" — the operator can separate them with:

# Configured miners only
warppool_miner_hashrate_ghs{label!="discovered"}

# Discovered miners (not in the DB)
warppool_miner_hashrate_ghs{label="discovered"}

Example queries:

# Total pool hashrate (sum of all miners)
sum(warppool_miner_hashrate_ghs)

# Maximum temperature across all miners — operator alarm if > 85°C
max(warppool_miner_temperature_c)

# Hashrate per worker per watt (efficiency)
warppool_miner_hashrate_ghs / warppool_miner_power_w

# Which miners have a failing probe cycle?
warppool_miner_probe_health == 0

Phase 15/16: notifier metrics

When a notifier is configured:

Metric	Type	Description
`warppool_notifier_sinks_active`	gauge	Number of initialized sinks
`warppool_notifier_events_sent_total{sink,event,result}`	counter	Send attempts per (sink, event kind, outcome)

result = "ok" or "err". event is one of block-found, miner-disconnect, rpc-down, rpc-recovered, test.

Example query: sink failure rate (a hint at wrong env vars or blocked webhooks):

rate(warppool_notifier_events_sent_total{result="err"}[5m])
  / ignoring(result) group_left
  rate(warppool_notifier_events_sent_total[5m])

Grafana Dashboard

A starter dashboard with the most important panels:

{
  "title": "dvb-WarpPool",
  "panels": [
    {
      "title": "Blocks Found",
      "type": "stat",
      "targets": [{ "expr": "warppool_blocks_found_total" }]
    },
    {
      "title": "Hashrate (approx, last 5min)",
      "type": "timeseries",
      "targets": [{
        "expr": "rate(warppool_shares_accepted_total[5m]) * pow(2, 32)"
      }]
    },
    {
      "title": "Active Connections",
      "type": "timeseries",
      "targets": [
        { "expr": "warppool_active_connections{protocol=\"v1\"}", "legendFormat": "v1" },
        { "expr": "warppool_active_connections{protocol=\"v2\"}", "legendFormat": "v2" }
      ]
    },
    {
      "title": "RPC Latency (p50/p99)",
      "type": "timeseries",
      "targets": [
        { "expr": "histogram_quantile(0.50, rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m]))", "legendFormat": "p50" },
        { "expr": "histogram_quantile(0.99, rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m]))", "legendFormat": "p99" }
      ]
    },
    {
      "title": "Bitcoin Core Health",
      "type": "stat",
      "targets": [
        { "expr": "warppool_rpc_ready" },
        { "expr": "warppool_rpc_ibd" }
      ]
    }
  ]
}

(A full dashboard with variables and annotations may follow later as packaging/grafana/dashboard.json.)

Prometheus scrape config

scrape_configs:
  - job_name: dvb-warppool
    scrape_interval: 15s
    static_configs:
      - targets: ['pool.local:18334']

Alert recipes

RPC unreachable > 2min

- alert: WarppoolRpcDown
  expr: warppool_rpc_ready == 0
  for: 2m
  annotations:
    summary: "Pool {{ $labels.instance }} has no RPC connection to the Bitcoin node"

No shares > 10min (miner offline?)

- alert: WarppoolNoShares
  expr: rate(warppool_shares_accepted_total[10m]) == 0
  for: 10m
  annotations:
    summary: "Pool {{ $labels.instance }} is receiving no shares"

RPC latency p99 > 1s

- alert: WarppoolRpcSlow
  expr: |
    histogram_quantile(0.99,
      rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m])
    ) > 1
  for: 5m

Notifier sink failing persistently

- alert: WarppoolNotifierBroken
  expr: |
    rate(warppool_notifier_events_sent_total{result="err"}[15m])
      / ignoring(result) group_left
      rate(warppool_notifier_events_sent_total[15m]) > 0.5
  for: 15m
  annotations:
    summary: "Notifier sink {{ $labels.sink }} failing >50% — check config"

Alongside /metrics, /api/events runs a Server-Sent-Events stream that pushes live events to the UI (block_found, new_job, shares_accepted, ...). It's primarily intended for the UI banners; for monitoring, use /metrics — Prometheus is more robust against scraping pauses.

dvb-WarpPool