Findings envelope reference¶
miner-core emits every finding as a tagged Finding JSON envelope on stdout — one NDJSON line per envelope. This doc is the human-readable companion to ../schemas/findings-v1.schema.json — the JSON Schema is the authoritative source of truth (regenerated from the schemars derives on the Rust types under crates/miner-core/src/findings/); this doc explains what each field means and why it is shaped the way it is.
The envelope shape is locked. schema_version = 1 is permanent for v1; every change in the v1 line is additive (new optional fields under #[serde(default)]), never breaking. Reserved-but-null slots (dsr, fdr_q) exist precisely so v2 can populate them without bumping schema_version.
The seven Finding variants¶
Every line in a miner run is one of seven Finding::* arms (crates/miner-core/src/findings/mod.rs line 545), discriminated by the top-level "kind" field via #[serde(tag = "kind", rename_all = "snake_case")]:
RunStart— first line of every run; carriesrun_id,started_at_utc,miner_version,code_revision, and the resolvedrequestblob (the fully-defaulted invocation).Result— the headline payload. One envelope per(scan x instrument(s) x timeframe x window x param-point). Carries all locked envelope fields plus the per-scaneffectblock.ScanError— mid-run scan failure. Preflight passed butScan::runerrored, the kernel rejected inputs, or computation produced an unrecoverable result. The run continues; only the offending job is lost.GapAborted— strict gap policy fired and aborted before anyResultwas emitted on this slice; carries the full gap manifest so consumers can see what was missing.DryRun— emitted only under--dry-run. Carries the planneddata_sliceplusestimated_findings_countand (forminer sweep --dry-run) theplanned_job_count.SweepSummary— last data line of aminer sweepinvocation, emitted immediately beforeRunEnd. CarriesSweepTotalsplus per-familyFdrFamilySummaryentries with Benjamini-Hochberg-adjusted q-values.RunEnd— final line of every run. Carriesended_at_utc,wall_clock_ms, and the aggregateRunSummary { results_emitted, scan_errors, gap_aborted, per_scan }.
Three of the seven (Result, ScanError, GapAborted) carry the locked envelope fields. The four framing-like variants (RunStart, RunEnd, DryRun, SweepSummary) intentionally do NOT carry them — framing records identify the run, not a particular scan-on-instrument result.
Stream order¶
The order of finding kinds within a single run is contractual:
- Exactly one
RunStartopens the stream. - Zero or more
Result/ScanError/GapAbortedenvelopes interleave (inrayon-deterministic order for sweeps — seesweep::executor::run_sweep). - For
miner sweeponly, exactly oneSweepSummaryfollows the last result-bearing envelope. - Exactly one
RunEndcloses the stream.
DryRun envelopes are emitted between RunStart and RunEnd and never co-occur with result-bearing envelopes — a dry-run never executes a scan kernel. See crates/miner-core/src/findings/mod.rs test dry_run_does_not_increment_results_emitted for the type-level invariant pin: a DryRun envelope MUST NOT touch RunSummary.results_emitted.
Consumers MUST NOT assume the absence of SweepSummary for a miner scan invocation indicates a failure — single-shot scans never emit it.
Common envelope fields¶
The three result-bearing variants (Result, ScanError, GapAborted) carry these locked fields inlined into each payload struct (NOT via #[serde(flatten)] — see the module doc-comment in findings/mod.rs for the anti-pattern rationale):
schema_version(u32;1in v1; bumps on a non-additive break).scan_id@version— serialised key (the Rust field isscan_id_at_version, renamed via#[serde(rename = "scan_id@version")]). String of the form"<scan_id>@<u32 version>", e.g."stats.autocorr.ljung_box@1".param_hash— blake3 hex over the canonical sorted-key resolved-param JSON (post-defaults). Identical params produce identical hashes regardless of TOML / JSON key order.code_revision—git describe-style string injected bybuild.rsat compile time. Format:"<sha>"or"dirty-<sha>"when the working tree had uncommitted changes at build.data_slice— the input range the scan actually consumed (see next section).dsr— reserved for v2 Deflated Sharpe Ratio. Serialises as JSONnullin v1 (NOT omitted).fdr_q— reserved for the BH-FDR adjusted q-value populated by the sweep summary aggregator. Serialises as JSONnulloutside a sweep.
Additionally every Result envelope carries:
run_id— the parentRunStart's ULID, echoed verbatim so all findings in one run share an identifier.produced_at_utc— RFC 3339 UTC timestamp.params— the resolved parameter map (post-defaults).serde_json::Valueshape — typically{"key": value, ...}.effect— the test statistic + p-value + optional CI + optional effect-size + per-scan extras (see "Effect block" below).raw— optional inputs the scan consumed (base64-encoded little-endian raw bytes; see "Raw arrays" below).repro— optional reproducibility envelope (populated when bootstrap / null resampling produced the finding's p-value or CI; see "Reproducibility envelope" below).
A representative Result envelope (abridged):
{
"kind": "result",
"schema_version": 1,
"scan_id@version": "stats.autocorr.ljung_box@1",
"param_hash": "blake3-abc123...",
"code_revision": "v0.1.0-12-gabc1234",
"data_slice": { "range": { "start_utc": "2024-06-12T00:00:00Z", "end_utc": "2024-06-13T00:00:00Z" }, "gap_manifest_ref": null, "gap_manifest": null, "sources": [{ "source_id": "dukascopy", "symbol": "EURUSD", "side": "bid", "timeframe": "15m" }] },
"dsr": null,
"fdr_q": null,
"run_id": "01HWQ5MMNJ9CWGR7XGMR4PHZJX",
"produced_at_utc": "2024-06-13T00:00:01Z",
"params": { "lags": 5 },
"effect": { "metric": "ljung_box_q", "value": 12.34, "p_value": 0.0123, "n": 96, "ci95": null, "effect_size": null, "extra": { "...": "..." } },
"raw": { "series": { "returns": { "data": "...base64...", "shape": [95], "dtype": "f64" }, "timestamps_ms": { "data": "...", "shape": [95], "dtype": "f64" } } },
"repro": null
}
RunStart / RunEnd framing¶
The first and last lines of every run are framing records. They identify the run; they are NOT result-bearing.
RunStart (crates/miner-core/src/findings/mod.rs lines 280-289) carries:
run_id— Crockford-base32 ULID (26 chars, e.g."01HWQ5MMNJ9CWGR7XGMR4PHZJX"). Copied verbatim onto every result-bearing envelope produced by this run.started_at_utc— RFC 3339 UTC timestamp.miner_version— semver string from the crate'sCargo.toml.code_revision— samegit describe-style string injected bybuild.rsas on result-bearing envelopes.request— the fully-resolved invocation (scan_id@version, instrument(s), side, timeframe, window, params with defaults applied,gap_policy,dry_runflag).serde_json::Valueshape; the structure mirrors the engine'sScanRequest.
RunEnd (lines 294-300) carries:
run_id+ended_at_utc.wall_clock_ms— total wall-clock duration of the run.summary: RunSummary { results_emitted, scan_errors, gap_aborted, per_scan }— aggregate counters across the run.per_scan: BTreeMap<String, PerScanCounts>keyed byscan_id@version.PerScanCounts { results, errors, gap_aborted }isCopy-able for use in test fixtures.
Exhaustive destructure of RunSummary is asserted by the run_summary_has_no_dry_run_emitted_field test (Warning 9 pin): adding a new field to RunSummary would break the destructure at compile time, signalling envelope-shape drift before tests even run.
data_slice¶
The data_slice block (crates/miner-core/src/findings/mod.rs lines 80-102) carries everything a consumer needs to know about the input range the scan actually consumed:
range: TimeRange { start_utc, end_utc }— the half-open continuous range used by the scan (post gap-partitioning undercontinuous_onlypolicy). UTC RFC 3339 strings.gap_manifest_ref: Option<String>— reserved for the content-addressed deduplication path;nullin v1.gap_manifest: Option<GapManifest>— the full inline Phase 2GapManifestunder--gap-policy=continuous_only.nullon the strict success path and in pre-Phase-3 callers. Serialises as JSONnullwhen absent — NEVER omitted. Same convention asdsr/fdr_q.sources: Vec<Source>— leg-labelled source vector (Phase 4 / D4-03). Length matches the scan'sarity().expected_len(): 1 for ANOM / SEAS, 2 for CROSS.
Each Source is { source_id, symbol, side, timeframe } — e.g. { "source_id": "dukascopy", "symbol": "EURUSD", "side": "bid", "timeframe": "15m" }.
Pair-arity / CROSS scans populate exactly two Source entries in ScanRequest.instruments order. The inline gap_manifest for a two-leg scan is the INTERSECTION of both legs' gaps (D4-04) — the continuous sub-range emitted is the maximal gap-free intersection of the two leg coverages.
Effect block¶
The effect block (crates/miner-core/src/findings/mod.rs lines 174-203) is the headline scan output:
metric: String— short identifier of the test statistic. Examples:"ljung_box_q","adf_statistic","jarque_bera_statistic","vr_minus_one","pearson_corr_last","lead_lag_argmax_lag","hour_of_day_max_abs_t_stat". See scan_catalogue.md for the per-scaneffect.metricliteral.value: f64— the canonical effect statistic.p_value: Option<f64>— present on most scans; some emit only a statistic (e.g. variance-ratio reports per-kp-values insideextraand leavesp_value = null).n: Option<u64>— sample size used.ci95: Option<[f64; 2]>—[lower, upper]confidence interval at 95%. Populated only when the caller opted into bootstrap resampling (HYG-03).effect_size: Option<EffectSize>— standardised effect-size statistic alongsidevalue(HYG-01 / D5-03). EachEffectSize { kind, value }carries an open-stringkind(canonical values:"cohens_d","hedges_g","cliffs_delta","vr_minus_one", plus scan-specific kinds).nullfor scans that do not emit one.extra: BTreeMap<String, RawArray>— per-scan extras carried as base64-with-shape arrays (same shape asraw.seriesentries). Examples:{"acf", "lags", "p_values", "q_stats"}for Ljung-Box;{"argmax_lag", "argmax_value", "ccf_values", "lags", "max_lag"}for lead-lag CCF;{"buckets", "counts", "iqrs", "means", "stds", "t_stats"}for the seasonality bucketers. See scan_catalogue.md for the per-scan key list.
BTreeMap ordering means effect.extra keys are alphabetic and stable across re-runs (OUT-03).
Raw arrays¶
Optional inputs the scan consumed live in raw.series (crates/miner-core/src/findings/mod.rs lines 129-161). The block follows the base64-with-shape pattern (D-02):
Raw { series: BTreeMap<String, RawArray> }—BTreeMap(NEVERHashMap) for alphabetic ordering.RawArray { data: Base64Bytes, shape: Vec<u64>, dtype: Dtype }:data— standard base64 (RFC 4648) over the raw little-endian bytes.shape— JSON array of unsigned ints, e.g.[95]or[1024, 4].dtype— v1 emits exactly one value:"f64"⟷np.dtype("<f8")(8-byte IEEE-754 double). TheDtypeenum incrates/miner-core/src/findings/base64_bytes.rsis intentionally a single-variant enum in v1; the JSON Schema treatsdtypeas an open-string field so additive variants ("f32","i64", future"json") can land without a schema break.
Every Raw block MUST carry a timestamps_ms key (D-03; enforced at construction by Raw::new). The timestamps_ms array is f64 little-endian milliseconds-since-Unix-epoch (v1 carries every array including timestamps as f64; consumers cast back to integer ms in user code) and is parallel to the other arrays in the block.
Canonical Python decode one-liner:
import base64, numpy as np
arr = np.frombuffer(base64.b64decode(raw_array["data"]), dtype="<f8").reshape(raw_array["shape"])
A worked end-to-end example lives at docs/examples/decode_finding.py (Plan 06-03).
Decode example for a typical Ljung-Box Result envelope:
import base64, json, numpy as np
line = json.loads(stdin_line)
assert line["kind"] == "result"
arr = line["raw"]["series"]["returns"]
returns = np.frombuffer(base64.b64decode(arr["data"]), dtype="<f8").reshape(arr["shape"])
ts_arr = line["raw"]["series"]["timestamps_ms"]
# v1 packs timestamps_ms as f64 (NOT i64); cast to int64 ms in user code.
timestamps_ms = np.frombuffer(base64.b64decode(ts_arr["data"]), dtype="<f8").reshape(ts_arr["shape"]).astype(np.int64)
assert returns.shape == timestamps_ms.shape
#returns[i] is the log-return at timestamps_ms[i]; reconstruct any per-scan invariant from this pair.
The dtype string "<f8" is NumPy's standard little-endian shorthand for the v1 sole Dtype::F64 variant. The shape array is the multi-dimensional shape; flat 1-D arrays appear as [N].
Reproducibility envelope¶
When a scan ran under bootstrap or null resampling (HYG-03 / HYG-04), ResultFinding.repro is populated with a ReproEnvelope (crates/miner-core/src/findings/mod.rs lines 254-269):
master_seed: u64— the seed the user passed via--seedor the sweep manifest[sweep].seed. Echoed verbatim on every finding from one run so consumers can correlate.job_seed: u64— per-job seed derived deterministically from the master seed plus the canonical job key (scan_id@version, instruments, timeframe, window,param_hash) via blake3. Equal seeds across re-runs produce byte-identical findings.bootstrap: Option<BootstrapSpec>—BootstrapSpec { method, n }when bootstrap CIs were computed. v1methodvalues:"stationary"(Politis-Romano),"block".nis the number of resamples drawn.null: Option<NullSpec>—NullSpec { method, n }when null resampling produced the p-value. v1methodvalues:"phase_scramble","circular_shift".nis the number of null draws.
The population rule (enforced by the Plan 05-03 engine integration): repro = Some(_) iff bootstrap or null was run for the finding. Closed-form-only findings carry repro = null. The envelope serialises as JSON null when absent (NOT omitted).
The RNG state in ReproEnvelope is NOT cryptographic. Xoshiro256PlusPlus is the seeded generator; predicting future outputs from observed outputs is trivial. The threat model has no secrecy requirement for resampling RNG state (T-05-01-I1).
Error envelopes¶
Two finding kinds carry typed error vocabularies.
ScanError carries an open-string error_code plus a free-form message and a request_context JSON blob. The error_code is constructed from the typed ScanErrorCode enum (crates/miner-core/src/error/codes.rs); v1 variants:
coverage_gap— coverage check failed mid-run on this slice.compute_error— kernel rejected the inputs (NaN propagation, ill-conditioned regression, insufficient post-window samples, etc.).cache_corruption— the derived-bar cache produced an unreadable frame.internal_panic_caught— defensive catch-unwind aroundScan::run; the kernel panicked but the run continues.
Preflight failures are NOT mid-stream ScanError envelopes. They are emitted as a single WireError on stderr (with the run's RunStart already on stdout if framing started) and the process exits with code 1. The PreflightCode vocabulary (crates/miner-core/src/error/codes.rs) covers:
invalid_parameter,unknown_scan,unknown_instrument,wrong_instrument_arity,missing_required_config,invalid_config,sweep_too_large,hygiene_not_supported,internal_error.
GapAborted is a separate kind, NOT a ScanError. It is emitted exactly once per scan run under --gap-policy=strict when the gap manifest disallows the requested window (D-08). The gap_manifest field carries the full Phase 2 manifest (tuples of (start, end, reason) for missing daily files, corrupt files, intra-day holes against the trading calendar).
Stdout, stderr, and exit codes¶
The envelope on stdout is one half of miner's contract; stderr is the other.
- Stdout carries NDJSON
Findingenvelopes only — one per line, no leading whitespace, no padding, terminated by\n. The single sanctioned writer isFindingSink(production impls:StdoutSinkandFileSink). Scans never callprintln!or any other stdout writer;clippy::disallowed_macrosrejects this at build time outside the sink module (D-15, D-19). - Stderr carries structured
tracingevents (instrument / scan / job spans) plus the singleWireErrorenvelope emitted on preflight rejection. Consumers parse stderr as line-delimited JSON when invoked via subprocess and treat any non-JSON line as a free-form log message. - Exit codes follow the four-tier mapping (D3-24; canonical source
compute_exit_codeincrates/miner-cli/src/main.rs): 0— clean run;RunEndclosed the stream.1— preflight rejection (a singleWireErroron stderr, noRunStarton stdout). Examples:unknown_scan,invalid_parameter,sweep_too_large.2— at least one mid-streamFinding::ScanErrorwas emitted (the run continued for other jobs but at least one failed). Exit 2 is also clap's default code when argv parsing fails — in that case stdout is empty and stderr carries clap's usage banner.130— SIGINT (Ctrl-C). Already-streamed envelopes persist; the run is interrupted between envelopes.
The locked envelope discipline is what makes subprocess invocation work: a consumer reading stdout line-by-line as JSON can correctly classify every line by "kind" without out-of-band coordination.
SweepSummary fields¶
SweepSummary is emitted exactly once at the end of a miner sweep invocation, immediately before RunEnd. Single-shot miner scan runs never emit it.
SweepSummaryFinding (crates/miner-core/src/findings/mod.rs lines 463-476) carries:
run_id+produced_at_utc— matches the surroundingRunStart/RunEndfor cross-finding correlation.fdr_by_family: BTreeMap<String, FdrFamilySummary>— one entry per family. Keyed byscan_id@version(default[fdr].family = "scan_id") or byscan_familyper configuration.BTreeMapordering guarantees alphabetic key emission.totals: SweepTotals { jobs_run, results_emitted, scan_errors, gap_aborted }— run-level aggregates.
Each FdrFamilySummary { method, alpha, per_finding } carries:
method— open-string FDR method;"benjamini_hochberg"in v1.alpha— FDR control level (typical: 0.05 or 0.10).per_finding: Vec<FindingFdrEntry { finding_index, raw_p, q_value }>— one row per finding in the family in stable index order, wherefinding_indexis the zero-indexed position of the finding within the family in the streaming JSONL output.Vec(NEVERHashMap) preserves the index alignment.
Consumers that want q-values on individual Result envelopes can post-process: read the streaming JSONL, group Result envelopes by their scan_id@version, then at end-of-sweep look up each finding's q_value in the matching FdrFamilySummary.per_finding by index.
Reserved-but-null fields¶
dsr and fdr_q on every Result / ScanError / GapAborted envelope serialise as JSON null in v1, NOT as absent fields (crates/miner-core/src/findings/mod.rs line 338-342). Why they exist:
dsr— Deflated Sharpe Ratio. Reserved for v2; the slot is locked into the v1 schema so v2 can populate it additively without bumpingschema_version.fdr_q— BH-FDR adjusted q-value for a single finding. Populated only by the SweepSummary aggregator on itsper_findingrows; on the individualResultenvelope it remainsnull(consumers join by index — see "SweepSummary fields" above). Reserved at the per-finding level so v2 may inline q-values directly onResultwithout a schema bump.
This is the same null-not-omitted convention used by data_slice.gap_manifest and effect.effect_size. DO NOT treat absent and null as equivalent — null is the explicit signal that v1 has no value to report.
Wire-form summary table¶
| Field | Variants carrying it | v1 value source |
|---|---|---|
schema_version |
Result, ScanError, GapAborted | Constant 1 |
scan_id@version |
Result, ScanError, GapAborted | Scan::id() + Scan::version() |
param_hash |
Result, ScanError, GapAborted | blake3 over canonical resolved params |
code_revision |
Result, ScanError, GapAborted, RunStart | miner_core::CODE_REVISION (build.rs) |
dsr |
Result, ScanError, GapAborted | Always null in v1 (reserved for v2) |
fdr_q |
Result, ScanError, GapAborted | Always null on per-finding envelopes in v1 |
repro |
Result | Some(_) iff bootstrap or null was run |
effect.ci95 |
Result | Some([lo, hi]) iff bootstrap was run |
effect.effect_size |
Result | Some({kind, value}) iff D5-03 emitted one |
See Also¶
- scan_catalogue.md — the 23 v1 scan_ids + per-scan
effect.metric+effect.extrakeys. - sweep_manifest.md — TOML sweep grammar (the source of
SweepSummary). - architecture.md — system map; the locked envelope discipline in context.
- ../schemas/findings-v1.schema.json — authoritative JSON Schema (regenerated from the Rust types via
xtask).
License¶
Licensed under the Apache License, Version 2.0. See: https://www.apache.org/licenses/LICENSE-2.0
Copyright 2026 Radius Red Ltd. | Contact