Architecture¶
Overview¶
- tradedesk-miner is a high-performance, agent-operable data-mining engine for historical financial OHLCV data.
- It scans cached Dukascopy bid/ask CSVs and surfaces statistical candidates (anomalies, cross-instrument relationships, seasonality effects) for downstream consumers — primarily the RadiusRed Quant agent.
- The codebase is organised into seven Cargo crates with a strict one-way dependency direction (six runtime crates plus a dev-only
xtaskworkspace member): miner-core— the sync + rayon library.- Owns the locked
Findingenvelope, the scan registry, the engine facade (engine::run_one), the sweep runner (sweep::run_sweep), the derived-bar cache, and the 23 v1 scans (ANOM / CROSS / SEAS). - Pure sync; no
tokio, noasync fn.
- Owns the locked
miner-reader-dukascopy— the reference implementation of theReadertrait against the existing tradedesk-dukascopy zstd-CSV cache layout.miner-cli— the thin clap-derive wrapper exposingminer scan/miner sweep/miner scans/miner emit-fixture.miner-mcpandminer-http— placeholder binaries.- MCP and HTTP server implementations are deferred to v2 — see
future_mcp_http.md. - The placeholder shells exist so the workspace graph (FOUND-01) is stable and v2 has anchor points.
- MCP and HTTP server implementations are deferred to v2 — see
miner-bench— the bench harness.xtask— dev-only workspace member hostingcargo run -p xtask -- gen-schemaand similar developer tooling; not part of the runtime artefact set.- Dependency direction:
miner-cli | miner-mcp | miner-http -> miner-reader-dukascopy -> miner-core. - CI gate 3 (
cargo tree -p miner-core --edges normal,build) enforces this —miner-coremust show zerotokio/asynctransitive dependencies.
Data Flow (high level)¶
Reader::read_dayingests zstd-CSVs from the tradedesk-dukascopy cache.- Path layout:
<root>/<SYMBOL>/<YYYY>/<MM 00-indexed>/<DD>_<bid|ask>.csv.zst. - The 00-indexed month quirk is encapsulated inside the Dukascopy reader and boundary-tested.
- The aggregator deterministically materialises higher-timeframe UTC-aligned bars (15m / 1h / 1d).
- Bars are served from the on-disk derived-bar cache as Arrow IPC files, keyed by
(source_id, symbol, side, timeframe). - Two-axis invalidation:
aggregator_version/arrow_schema_versionmismatch triggers a full rebuild; a per-day blake3 fingerprint mismatch triggers a day-splice. GapDetectorproduces a structuredGapManifestof(start, end, reason)tuples before any scan runs.engine::run_oneis the single facade entry.- It runs preflight -> framing -> dry-run -> gap-policy dispatch ->
Scan::run->RunSummary. - Findings are emitted through the
FindingSinktrait. - Scans are sync + rayon-parallel; multi-job sweeps fan out via
sweep::run_sweep. rayon::par_iteroverResolvedJobs with a deterministic-order buffered drain.- End-of-sweep BH-FDR aggregation emits a
SweepSummaryfinding closing the run. FindingSinkis the single sanctioned writer.- Production impls:
StdoutSink(CLI) andFileSink(tests). - Stdout = findings JSONL; stderr =
tracingstructured logs.
Sync core + async edges¶
miner-coreis pure sync + rayon. Notokio, noasync fn, no.await. This is FOUND-04 and is CI-enforced.- Async lives only at the wrapper edges.
- The MCP and HTTP wrappers (designed but not implemented in v1 — see
future_mcp_http.md) will bridge tominer-coreviatokio::task::spawn_blocking. - The scan engine never blocks an async runtime worker.
- Stdout is reserved for findings JSONL. Stderr is reserved for structured logs.
clippy::disallowed_macrosrejectsprintln!/eprintln!outside the single findings sink and the logging adapter (CI gate 2).
Key design decisions¶
- Locked
Findingenvelope. - Seven variants:
RunStart/Result/ScanError/GapAborted/RunEnd/DryRun/SweepSummary. - Every variant carries
schema_version,scan@version,param_hash,code_revision,data_slice, and reserved DSR + FDR-q slots. - Schema-additive discipline only; ground truth is
schemas/findings-v1.schema.jsonregenerated from the schemars derive on the Rust types. - Gap-policy enforcement. No silent scans over gapped data.
- Callers pick
strict(abort + emit the gap manifest as a structured error, zero findings) orcontinuous_only(partition into maximal gap-free sub-ranges, one finding per sub-range, gap manifest attached). - Reproducibility envelope.
master_seedpropagates through everyResolvedJob.derive_job_seed(blake3) makes each scan's RNG state deterministic and byte-identical across re-runs.- Streaming and stateless.
- miner emits findings as NDJSON and exits. There is no persistent results store; callers own persistence.
- The only writable state miner owns is its derived-bar cache.
- Agent-operability.
- Every miner capability is reachable from the CLI today and from MCP + HTTP in v2 (see
future_mcp_http.md). - The CLI is the load-bearing v1 surface; MCP + HTTP wrappers add transport ergonomics without changing the engine.
- Open-source posture.
- No hardcoded paths. Cache root + derived-bar-cache root + output destination all configurable via CLI flag > env var > config file precedence.
- Apache-2.0 licensed.
License¶
Licensed under the Apache License, Version 2.0. See: https://www.apache.org/licenses/LICENSE-2.0
Copyright 2026 Radius Red Ltd. | Contact