External data sources guide¶
tradedesk.data_sources covers datasets that are neither live market streams
nor Dukascopy candle cache files.
Three free, no-auth macro feeds are supported:
- FRED (
fred) — US rates + VIX from the St. Louis Fed. - ECB (
ecb) — EUR €STR, AAA government yield curve and Euribor from the ECB Data Portal. - CFTC COT (
cot) — Commitment-of-Traders positioning.
The Macro Parquet lake section describes how all three
are materialized to Parquet and loaded in backtests via a single access
pattern (load_macro_series / load_macro_frame). The sections below the lake
document the lower-level CFTC COT parsing API in detail.
Macro Parquet lake¶
Macro series are slow tabular time series, so each series is materialized to a single typed Parquet file under the existing market-data lake (the same root used by the Dukascopy OHLCV cache):
<lake>/macro/fred/<SERIES_ID>.parquet # e.g. DGS10.parquet
<lake>/macro/ecb/<LABEL>.parquet # e.g. EUR_YLD_2Y.parquet
<lake>/macro/cftc/<LABEL>.parquet # e.g. EURUSD.parquet, GOLD.parquet
<lake> defaults to $TRADEDESK_MARKETDATA or /paperclip/tradedesk/marketdata.
Ingest (on demand or weekly)¶
# Everything (FRED + ECB + CFTC) since 2010 into the default lake
python -m tradedesk.data_sources.ingest
# One source, custom lake, custom start
python -m tradedesk.data_sources.ingest --source cftc --lake /data/marketdata --from 2018-01-01
Ingestion is idempotent — re-running refreshes each series in place and re-downloads the current-year CFTC zip so the latest Friday release appears. Failures are isolated at two levels so a weekly cron stays green when an upstream endpoint is briefly unavailable. A failure on a single series — including read timeouts and dropped connections — is logged and skipped (never fatal), and the prior parquet for that series is left in place. On top of that, an entirely dead source is isolated: if FRED is down, ECB and CFTC still materialize, and the run returns the surviving sources rather than aborting. Run it weekly (CFTC publishes Fridays 15:30 ET; a Saturday run captures the new week).
Load (the backtest access pattern)¶
from tradedesk.data_sources import load_macro_series, load_macro_frame
# One series → DatetimeIndex 'date' + a 'value' column (FRED/ECB)
dgs10 = load_macro_series("FRED", "DGS10")
estr = load_macro_series("ECB", "EUR_ESTR")
# Several series of one source → wide DataFrame, one column per label
rates = load_macro_frame("FRED", ["DGS2", "DGS10", "VIXCLS"])
# CFTC → Tuesday-indexed positioning frame (release_date, open_interest,
# commercial/dealer/asset_mgr/leveraged long/short/net)
eur_cot = load_macro_series("CFTC", "EURUSD")
Series catalog¶
| Source | Labels | Frequency | Notes |
|---|---|---|---|
| FRED | DFF, DGS3MO, DGS2, DGS10, T10Y2Y, VIXCLS, VXVCLS |
daily | Effective fed funds, Treasury yields, 10Y-2Y spread, VIX, and VIX3M (3-month) for VIX term-structure work |
| ECB | EUR_ESTR |
daily | €STR — the EUR OIS reference (overnight) rate |
| ECB | EUR_YLD_3M/6M/1Y/2Y/5Y/10Y |
daily | AAA government yield-curve spot rates |
| ECB | EUR_EURIBOR_1M, EUR_EURIBOR_3M |
monthly | Euribor term money-market rates |
| CFTC | GOLD, SILVER, WTI, BRENT, NATGAS, SP500, TNOTE10, EURUSD, JPYUSD, GBPUSD |
weekly | COT positioning; index on report-Tuesday |
On EUR OIS: the ECB does not publish term OIS swap fixings (1w/1m/3m) as a
free single series. EUR_ESTR is the overnight OIS reference and the short-end
yield-curve tenors give the market-implied term structure; together they are
the free, no-auth proxies for OIS-implied magnitude in a rate-surprise study.
No look-ahead with CFTC: the date index is the Tuesday as-of date, which
is not public until the Friday release_date. Backtests must gate entries
off release_date (or the first trading day after it), never the index.
What the module provides (CFTC COT detail)¶
Import from tradedesk.data_sources:
from tradedesk.data_sources import (
CFTC_CONTRACTS,
CFTCReport,
cot_release_date,
download_cot_zip,
iter_cot_rows,
load_contract_history,
)
Public API:
CFTC_CONTRACTS: built-in short-label to CFTC contract mappingCFTCReport: report family enum (DISAGGREGATEDorTFF)download_cot_zip(...): download and cache raw annual CFTC zip archivesiter_cot_rows(...): stream parsedCOTRowobjects from an archive setload_contract_history(...): load a sorted weekly history for one contractcot_release_date(...): map the CFTC Tuesday as-of date to the scheduled Friday release date
Contract map¶
The built-in contract map currently covers:
GOLDSILVERWTIBRENTNATGASSP500TNOTE10EURUSD(CME Euro FX)JPYUSD(CME Japanese Yen)GBPUSD(CME British Pound)
Commodity contracts use the CFTC disaggregated futures report. SP500,
TNOTE10 and the CME currency futures use the Traders in Financial Futures
(TFF) report. CME FX futures are quoted as foreign-currency/USD, so a net-long
position is bullish the base currency vs the dollar.
Loading one contract history¶
from datetime import date
from pathlib import Path
from tradedesk.data_sources import CFTC_CONTRACTS, load_contract_history
rows = load_contract_history(
cache_dir=Path("/tmp/tradedesk-cache"),
contract=CFTC_CONTRACTS["GOLD"],
date_from=date(2020, 1, 1),
date_to=date(2020, 12, 31),
)
latest = rows[-1]
print(latest.report_date_tuesday, latest.release_date_friday, latest.commercial_net)
Each returned COTRow includes:
report_date_tuesday: the CFTC as-of daterelease_date_friday: the scheduled publication date for that report weekreport: which report family produced the rowcommercial_long,commercial_short,commercial_netopen_interest- TFF-only buckets (
0for disaggregated rows):dealer_*,asset_mgr_*,leveraged_*long/short/net. For TFF rows the dealer bucket is mirrored intocommercial_*for backward compatibility, so FX/index trend-gating studies should read the explicitasset_mgr_*/leveraged_*(speculative) positioning rather thancommercial_*.
Release-date semantics¶
The CFTC report week closes on Tuesday and is normally published on Friday.
cot_release_date(...) encodes that fixed three-calendar-day offset.
from datetime import date
from tradedesk.data_sources import cot_release_date
assert cot_release_date(date(2026, 1, 6)).isoformat() == "2026-01-09"
Strategies that trade from COT inputs should key entry timing off the release date rather than the Tuesday as-of date to avoid look-ahead bias.
Cache layout¶
Downloads are stored under cache_dir/cftc/. Existing zip files are reused on
subsequent calls unless force=True is passed to download_cot_zip(...).
This cache is separate from the Dukascopy cache used by
tradedesk.execution.backtest.