Performance Benchmarks

dvb-WarpPool uses Criterion for hot-path benchmarks. Three suites cover the critical code paths. The goal is not maximum throughput — a solo pool's workload is small — but a regression baseline so that code changes are caught early when they unexpectedly become expensive.

Suites

BenchCrateHot-PathFrequency in Pool
validatewarppool-share-validatorShareValidator::validate() per accepted+rejected shareper-share (most frequent)
build_jobwarppool-job-builderJobBuilder::build() per new block templateper-job (~every 30-60s)
vardiffwarppool-stratum-v1VarDiff::observe_share() per accepted shareper-share

Running Locally

# Single bench
cargo bench -p warppool-share-validator --bench validate

# All three
cargo bench --workspace --benches

# Compile-check only, no runs (CI smoke)
cargo bench --workspace --no-run

Criterion writes reports to target/criterion/<bench-name>/report/index.html. On a second run it compares against the first and reports drift (Performance has regressed. / Performance has improved.).

Baseline Numbers (2026-05-27, MacBook M-Series, release build)

These numbers are a snapshot, not a hard contract — they can vary by 2× depending on hardware and CPU throttling. On Linux x86_64 server hardware the values are typically similar or better.

validate (per share)

BenchTimeThroughput
validate_full/0 (no merkle branches)1.32 µs760K shares/s
validate_full/8 (typical regtest)5.49 µs182K shares/s
validate_full/12 (typical mainnet)7.59 µs132K shares/s
sha256d_80b_header528 ns
sha256d_500b_coinbase1.55 µs
merkle_root/12(hot-path portion) ~2 µs
reconstruct_coinbase< 200 ns
build_header< 30 ns

Take-away: validate scales with merkle-branch count (linearly). At 12 branches the pool can validate ~130K shares/s — that's 1000× more than a solo pool with 7 Bitaxes will ever see (typically 1-5 shares/s). Validate is NEVER the bottleneck.

build_job (per job-refresh)

BenchTimeThroughput
build_job/0 (empty / regtest)~100 µs
build_job/100~150 µs
build_job/1000~700 µs
build_job/4000 (typical full block)2.59 ms386 jobs/s
merkle_branches/40002.30 ms

Take-away: Job-build scales with tx-count, dominated by merkle-branch computation. 2.59ms / job for a full mainnet block is clearly visible but not a problem — templates arrive every 30+ seconds, not every ms.

vardiff (per share, EMA-update + retarget)

BenchTime
vardiff_observe_share_hold (stationary)5.2 ns
vardiff_observe_share_retarget (8-share burst)37.9 ns (~5ns/share)
difficulty_to_target_be12.85 ns
vardiff_decision_variant_match432 ps

Take-away: VarDiff is effectively free. Even under extreme load scenarios (>100K shares/s) it consumes <1ms/s of CPU.

Interpretation

What you can read from the numbers:

QuestionHint
"Is my pool burning too much CPU?"No. At 10 shares/s and 12 merkle branches: ~76 µs share-validate time per second = 0.0076% CPU
"How many workers can my pool serve at most?"The Stratum connection cap (profile-dependent, 64-4096). Share-validate is not the limit
"Is ASIC-boost / merkle-tree caching worth it?"No, not in a solo pool. In a 10M-shares/s pool, per-template merkle-branch caching would be a factor of 5-10

CI

.github/workflows/benches.yml runs only:

  • Manual dispatch (operator clicks "Run workflow" in the UI)
  • On tag push (release snapshot)

NOT on every PR — Criterion runs are expensive (~5min build + 5min suite), and GitHub-runner noise makes microbench comparisons unreliable.

Reports are uploaded as artifact criterion-reports-<sha> with 30-day retention. The operator can download them and view them locally in the HTML report.

Regression Workflow

When a bench suddenly becomes 50% slower:

  1. Run cargo bench --bench <name> locally → confirm
  2. git bisect between the last known-good version and HEAD
  3. On dependency bumps: inspect the Cargo.lock diff (often pulls a new version of a transitive dep)

Criterion automatically stores the last baseline in target/criterion/ — when you bench locally, it compares against YOUR last run, not against GitHub. For a CI-vs-local comparison, download the artifact and place it locally under the target/criterion/ path.

What is Deliberately Not Benched

PathWhy not
Stratum V1 TCP I/OTokio async-IO is syscall-bound; criterion would be noise-dominated. tokio-console is more useful for inspection.
Bitcoin RPCNetwork IO + Bitcoin-Core-side dominates. The Phase 16.3 RPC-latency histogram is the right observation.
Translator V1↔V2 mappingPer-job (every 30-60s), not latency-critical. Would be effort for little benefit.
Storage SQLsqlx + WAL mode dominates. If needed, bench directly with the sqlite3 CLI.
Notifier sinksHTTP/SMTP IO, not CPU-bound. End-to-end latency is readable from the /metrics histogram.

See Also

  • Observability — Runtime metrics (Prometheus) instead of synthetic benches
  • Testing — Unit / integration / sim tests