Observability
dvb-WarpPool exposes its runtime state in two complementary ways:
- Pull — Prometheus-compatible
/metricsendpoint - Push — Notifier sinks (see Notifications) for operator-aware events
A good setup uses both: Prometheus scrapes every 15s for trends and alerts, and critical events (block found, RPC down) go out immediately as notifications.
/metrics Endpoint
Path: GET /metrics on the regular API port (default 18334). Format:
Prometheus text exposition text/plain; version=0.0.4.
Authentication: none — the endpoint is read-only and contains no secrets. If your pool network is public and you don't like that, put a reverse proxy with basic auth in front of it.
Base counters (always present)
| Metric | Type | Description |
|---|---|---|
warppool_blocks_found_total | counter | Accepted blocks since the first daemon start |
warppool_shares_accepted_total | counter | Accepted shares across all workers |
warppool_shares_rejected_total | counter | Stale / low-diff / malformed |
warppool_workers_total | gauge | Number of workers ever seen |
warppool_rpc_ready | gauge | 1 if Bitcoin Core RPC is reachable |
warppool_rpc_ibd | gauge | 1 if Bitcoin Core is in initial block download |
warppool_network_height | gauge | Chain tip height according to our node |
warppool_network_difficulty | gauge | Current network difficulty |
warppool_current_job_height | gauge | Height of the template currently being served |
warppool_current_job_coinbase_value_sats | gauge | Coinbase reward in sats |
warppool_started_at_seconds | gauge | Daemon start as a unix timestamp |
warppool_last_template_at_seconds | gauge | Last successful getblocktemplate |
warppool_build_info{brand,profile,chain} | gauge | Constant 1, all constants in labels |
Phase 16: extended pool metrics
These are active as soon as the daemon hands PoolMetrics to the API state
(automatic when the daemon binary is running; optional in test setups).
| Metric | Type | Description |
|---|---|---|
warppool_workers_authorized_total | counter | Cumulative mining.authorize successes (v1) + OpenChannel successes (v2) |
warppool_workers_disconnected_total | counter | Cumulative authenticated worker disconnects |
warppool_active_connections{protocol="v1"} | gauge | Open Stratum V1 connections |
warppool_active_connections{protocol="v2"} | gauge | Open Stratum V2 connections |
warppool_bitcoin_rpc_latency_seconds | histogram | End-to-end RPC call duration (all retries included) |
Histogram buckets (seconds): 0.001, 0.005, 0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, +Inf. Prometheus cumulative semantics — each observation increments every bucket ≥ its value.
Example query (Grafana):
# p99 RPC latency, last 5 minutes
histogram_quantile(0.99, rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m]))
# RPC call rate
rate(warppool_bitcoin_rpc_latency_seconds_count[1m])
Phase 22: per-miner vendor probe metrics
When the daemon's miner_poll_loop is running (default), configured miners
are polled every 30s and their telemetry values are exposed as gauges:
| Metric | Type | Labels | Description |
|---|---|---|---|
warppool_miner_hashrate_ghs | gauge | label, host, vendor, model | Miner-reported hashrate in GH/s |
warppool_miner_temperature_c | gauge | label, host, vendor, model | ASIC core temperature in °C |
warppool_miner_power_w | gauge | label, host, vendor, model | Power draw in watts |
warppool_miner_voltage_mv | gauge | label, host, vendor, model | ASIC core voltage in mV |
warppool_miner_fan_rpm | gauge | label, host, vendor, model | Fan speed in RPM |
warppool_miner_last_probe_age_seconds | gauge | label, host, vendor | Seconds since the last successful probe |
warppool_miner_probe_health | gauge | label, host, vendor | 1 if OK and recent (<5min); 0 if error or stale |
None fields are skipped — if a miner doesn't report voltage_mv, for
example, the metric is simply omitted for that miner (instead of 0, which
would wreck the operator's trend lines).
If WARPPOOL_AUTO_PROBE_DISCOVERED=true is set, miners discovered via mDNS
are also included with label="discovered" — the operator can separate them
with:
# Configured miners only
warppool_miner_hashrate_ghs{label!="discovered"}
# Discovered miners (not in the DB)
warppool_miner_hashrate_ghs{label="discovered"}
Example queries:
# Total pool hashrate (sum of all miners)
sum(warppool_miner_hashrate_ghs)
# Maximum temperature across all miners — operator alarm if > 85°C
max(warppool_miner_temperature_c)
# Hashrate per worker per watt (efficiency)
warppool_miner_hashrate_ghs / warppool_miner_power_w
# Which miners have a failing probe cycle?
warppool_miner_probe_health == 0
Phase 15/16: notifier metrics
When a notifier is configured:
| Metric | Type | Description |
|---|---|---|
warppool_notifier_sinks_active | gauge | Number of initialized sinks |
warppool_notifier_events_sent_total{sink,event,result} | counter | Send attempts per (sink, event kind, outcome) |
result = "ok" or "err". event is one of block-found,
miner-disconnect, rpc-down, rpc-recovered, test.
Example query: sink failure rate (a hint at wrong env vars or blocked webhooks):
rate(warppool_notifier_events_sent_total{result="err"}[5m])
/ ignoring(result) group_left
rate(warppool_notifier_events_sent_total[5m])
Grafana Dashboard
A starter dashboard with the most important panels:
{
"title": "dvb-WarpPool",
"panels": [
{
"title": "Blocks Found",
"type": "stat",
"targets": [{ "expr": "warppool_blocks_found_total" }]
},
{
"title": "Hashrate (approx, last 5min)",
"type": "timeseries",
"targets": [{
"expr": "rate(warppool_shares_accepted_total[5m]) * pow(2, 32)"
}]
},
{
"title": "Active Connections",
"type": "timeseries",
"targets": [
{ "expr": "warppool_active_connections{protocol=\"v1\"}", "legendFormat": "v1" },
{ "expr": "warppool_active_connections{protocol=\"v2\"}", "legendFormat": "v2" }
]
},
{
"title": "RPC Latency (p50/p99)",
"type": "timeseries",
"targets": [
{ "expr": "histogram_quantile(0.50, rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m]))", "legendFormat": "p50" },
{ "expr": "histogram_quantile(0.99, rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m]))", "legendFormat": "p99" }
]
},
{
"title": "Bitcoin Core Health",
"type": "stat",
"targets": [
{ "expr": "warppool_rpc_ready" },
{ "expr": "warppool_rpc_ibd" }
]
}
]
}
(A full dashboard with variables and annotations may follow later as
packaging/grafana/dashboard.json.)
Prometheus scrape config
scrape_configs:
- job_name: dvb-warppool
scrape_interval: 15s
static_configs:
- targets: ['pool.local:18334']
Alert recipes
RPC unreachable > 2min
- alert: WarppoolRpcDown
expr: warppool_rpc_ready == 0
for: 2m
annotations:
summary: "Pool {{ $labels.instance }} has no RPC connection to the Bitcoin node"
No shares > 10min (miner offline?)
- alert: WarppoolNoShares
expr: rate(warppool_shares_accepted_total[10m]) == 0
for: 10m
annotations:
summary: "Pool {{ $labels.instance }} is receiving no shares"
RPC latency p99 > 1s
- alert: WarppoolRpcSlow
expr: |
histogram_quantile(0.99,
rate(warppool_bitcoin_rpc_latency_seconds_bucket[5m])
) > 1
for: 5m
Notifier sink failing persistently
- alert: WarppoolNotifierBroken
expr: |
rate(warppool_notifier_events_sent_total{result="err"}[15m])
/ ignoring(result) group_left
rate(warppool_notifier_events_sent_total[15m]) > 0.5
for: 15m
annotations:
summary: "Notifier sink {{ $labels.sink }} failing >50% — check config"
SSE Events (separate story)
Alongside /metrics, /api/events runs a Server-Sent-Events stream that
pushes live events to the UI (block_found, new_job, shares_accepted,
...). It's primarily intended for the UI banners; for monitoring, use
/metrics — Prometheus is more robust against scraping pauses.