Performance Benchmarks
dvb-WarpPool uses Criterion for hot-path benchmarks. Three suites cover the critical code paths. The goal is not maximum throughput — a solo pool's workload is small — but a regression baseline so that code changes are caught early when they unexpectedly become expensive.
Suites
| Bench | Crate | Hot-Path | Frequency in Pool |
|---|---|---|---|
validate | warppool-share-validator | ShareValidator::validate() per accepted+rejected share | per-share (most frequent) |
build_job | warppool-job-builder | JobBuilder::build() per new block template | per-job (~every 30-60s) |
vardiff | warppool-stratum-v1 | VarDiff::observe_share() per accepted share | per-share |
Running Locally
# Single bench
cargo bench -p warppool-share-validator --bench validate
# All three
cargo bench --workspace --benches
# Compile-check only, no runs (CI smoke)
cargo bench --workspace --no-run
Criterion writes reports to target/criterion/<bench-name>/report/index.html.
On a second run it compares against the first and reports drift
(Performance has regressed. / Performance has improved.).
Baseline Numbers (2026-05-27, MacBook M-Series, release build)
These numbers are a snapshot, not a hard contract — they can vary by 2× depending on hardware and CPU throttling. On Linux x86_64 server hardware the values are typically similar or better.
validate (per share)
| Bench | Time | Throughput |
|---|---|---|
validate_full/0 (no merkle branches) | 1.32 µs | 760K shares/s |
validate_full/8 (typical regtest) | 5.49 µs | 182K shares/s |
validate_full/12 (typical mainnet) | 7.59 µs | 132K shares/s |
sha256d_80b_header | 528 ns | — |
sha256d_500b_coinbase | 1.55 µs | — |
merkle_root/12 | (hot-path portion) ~2 µs | — |
reconstruct_coinbase | < 200 ns | — |
build_header | < 30 ns | — |
Take-away: validate scales with merkle-branch count (linearly). At 12 branches the pool can validate ~130K shares/s — that's 1000× more than a solo pool with 7 Bitaxes will ever see (typically 1-5 shares/s). Validate is NEVER the bottleneck.
build_job (per job-refresh)
| Bench | Time | Throughput |
|---|---|---|
build_job/0 (empty / regtest) | ~100 µs | — |
build_job/100 | ~150 µs | — |
build_job/1000 | ~700 µs | — |
build_job/4000 (typical full block) | 2.59 ms | 386 jobs/s |
merkle_branches/4000 | 2.30 ms | — |
Take-away: Job-build scales with tx-count, dominated by merkle-branch computation. 2.59ms / job for a full mainnet block is clearly visible but not a problem — templates arrive every 30+ seconds, not every ms.
vardiff (per share, EMA-update + retarget)
| Bench | Time |
|---|---|
vardiff_observe_share_hold (stationary) | 5.2 ns |
vardiff_observe_share_retarget (8-share burst) | 37.9 ns (~5ns/share) |
difficulty_to_target_be | 12.85 ns |
vardiff_decision_variant_match | 432 ps |
Take-away: VarDiff is effectively free. Even under extreme load scenarios (>100K shares/s) it consumes <1ms/s of CPU.
Interpretation
What you can read from the numbers:
| Question | Hint |
|---|---|
| "Is my pool burning too much CPU?" | No. At 10 shares/s and 12 merkle branches: ~76 µs share-validate time per second = 0.0076% CPU |
| "How many workers can my pool serve at most?" | The Stratum connection cap (profile-dependent, 64-4096). Share-validate is not the limit |
| "Is ASIC-boost / merkle-tree caching worth it?" | No, not in a solo pool. In a 10M-shares/s pool, per-template merkle-branch caching would be a factor of 5-10 |
CI
.github/workflows/benches.yml runs only:
- Manual dispatch (operator clicks "Run workflow" in the UI)
- On tag push (release snapshot)
NOT on every PR — Criterion runs are expensive (~5min build + 5min suite), and GitHub-runner noise makes microbench comparisons unreliable.
Reports are uploaded as artifact criterion-reports-<sha> with 30-day
retention. The operator can download them and view them locally in the
HTML report.
Regression Workflow
When a bench suddenly becomes 50% slower:
- Run
cargo bench --bench <name>locally → confirm git bisectbetween the last known-good version and HEAD- On dependency bumps: inspect the Cargo.lock diff (often pulls a new version of a transitive dep)
Criterion automatically stores the last baseline in target/criterion/
— when you bench locally, it compares against YOUR last run, not
against GitHub. For a CI-vs-local comparison, download the artifact and
place it locally under the target/criterion/ path.
What is Deliberately Not Benched
| Path | Why not |
|---|---|
| Stratum V1 TCP I/O | Tokio async-IO is syscall-bound; criterion would be noise-dominated. tokio-console is more useful for inspection. |
| Bitcoin RPC | Network IO + Bitcoin-Core-side dominates. The Phase 16.3 RPC-latency histogram is the right observation. |
| Translator V1↔V2 mapping | Per-job (every 30-60s), not latency-critical. Would be effort for little benefit. |
| Storage SQL | sqlx + WAL mode dominates. If needed, bench directly with the sqlite3 CLI. |
| Notifier sinks | HTTP/SMTP IO, not CPU-bound. End-to-end latency is readable from the /metrics histogram. |
See Also
- Observability — Runtime metrics (Prometheus) instead of synthetic benches
- Testing — Unit / integration / sim tests