| symbol | name | channel | source | asset_type | frequency | needs_key | |
|---|---|---|---|---|---|---|---|
| 0 | SPY | SPDR S&P 500 ETF Trust | Broad-market baseline | yfinance | equity-ETF | business-day | False |
| 1 | XLE | Energy Select Sector SPDR Fund | Energy | yfinance | equity-ETF | business-day | False |
| 2 | ITA | iShares U.S. Aerospace & Defense ETF | Defense procurement | yfinance | equity-ETF | business-day | False |
| 3 | DBA | Invesco DB Agriculture Fund | Agricultural commodities | yfinance | equity-ETF | business-day | False |
| 4 | BDRY | Breakwave Dry Bulk Shipping ETF | Shipping & insurance | yfinance | equity-ETF | business-day | False |
| 5 | GLD | SPDR Gold Shares | Safe havens & FX | yfinance | equity-ETF | business-day | False |
| 6 | ^VIX | CBOE Volatility Index | Safe havens & FX | yfinance | index | business-day | False |
| 7 | UUP | Invesco DB US Dollar Index Bullish Fund | Safe havens & FX | yfinance | equity-ETF | business-day | False |
| 8 | BTC-USD | Bitcoin (USD) | Crypto (regime-dependent) | yfinance | crypto | daily | False |
| 9 | ETH-USD | Ethereum (USD) | Crypto (regime-dependent) | yfinance | crypto | daily | False |
| 10 | LMT | Lockheed Martin Corp. | Defense procurement | yfinance | equity | business-day | False |
| 11 | DGS10 | 10-Year Treasury Constant Maturity Rate | Safe havens & FX | fred | rate | business-day | True |
| 12 | DGS2 | 2-Year Treasury Constant Maturity Rate | Safe havens & FX | fred | rate | business-day | True |
| 13 | T10Y3M | 10Y minus 3M Treasury Spread | Macro context | fred | rate-spread | business-day | True |
| 14 | DCOILWTICO | WTI Crude Oil Spot Price | Energy | fred | commodity-price | business-day | True |
| 15 | DHHNGSP | Henry Hub Natural Gas Spot Price | Energy | fred | commodity-price | business-day | True |
| 16 | VIXCLS | CBOE Volatility Index (FRED) | Safe havens & FX | fred | index | business-day | True |
| 17 | CPILFESL | Core CPI (All Urban, less food & energy) | Macro context | fred | macro-index | monthly | True |
| 18 | UNRATE | Unemployment Rate | Macro context | fred | macro-rate | monthly | True |
| 19 | WPU01210101 | PPI by Commodity: Farm Products: Wheat | Agricultural commodities | fred | macro-index | monthly | True |
| 20 | BAMLH0A0HYM2 | ICE BofA US High Yield OAS | Credit / risk regime | fred | credit-spread | business-day | True |
| 21 | BAMLC0A0CM | ICE BofA US Corporate (IG) OAS | Credit / risk regime | fred | credit-spread | business-day | True |
| 22 | BAA10Y | Moody's Baa Corporate minus 10Y Treasury (cred... | Credit / risk regime | fred | credit-spread | business-day | True |
| 23 | AAA10Y | Moody's Aaa Corporate minus 10Y Treasury (cred... | Credit / risk regime | fred | credit-spread | business-day | True |
| 24 | NFCI | Chicago Fed National Financial Conditions Index | Financial conditions | fred | index | weekly | True |
| 25 | ANFCI | Chicago Fed Adjusted NFCI | Financial conditions | fred | index | weekly | True |
| 26 | T10YIE | 10-Year Breakeven Inflation Rate | Inflation expectations | fred | rate | business-day | True |
| 27 | DFII10 | 10-Year TIPS Real Yield | Real rates | fred | rate | business-day | True |
| 28 | DFF | Effective Federal Funds Rate | Monetary policy | fred | rate | business-day | True |
| 29 | M2SL | M2 Money Supply (seasonally adjusted) | Liquidity | fred | macro-level | monthly | True |
| 30 | INDPRO | Industrial Production Index | Growth (coincident) | fred | macro-index | monthly | True |
| 31 | SAHMREALTIME | Sahm Rule Recession Indicator (real-time) | Recession regime | fred | macro-rate | monthly | True |
| 32 | HG=F | COMEX Copper Futures (Dr. Copper) | Commodity / growth barometer | yfinance | commodity | business-day | False |
| 33 | EUR/USD | EUR/USD daily (Alpha Vantage FX_DAILY) | Safe havens & FX | alphavantage | FX | business-day | True |
| 34 | PET.RWTC.D | WTI Crude Oil Spot (EIA) | Energy | eia | commodity-price | business-day | True |
| 35 | EURO | Treasury Reporting Rate of Exchange — Euro | Safe havens & FX | treasury | FX | quarterly | False |
3 Data Understanding
CRISP-DM Phase 2. Collect initial data and build familiarity with it, identify data-quality problems, and form first hypotheses. See The CRISP-DM Process for the methodology overview.
This chapter executes the second CRISP-DM phase for PortfolioLens. It picks up the Business Understanding hand-off — baseline-first, multi-horizon, a Sharpe-based objective, and the six transmission channels (energy, agricultural commodities, defense, safe-havens & FX, shipping, regime-dependent crypto) — and asks the Phase-2 question: what data can we actually get, for free, and is it good enough to build on?
This revision also contextualizes the data against the indicator catalog in strategy.md: §5 maps every series onto that catalog’s leading/coincident/ lagging and regime framework, and a companion appendix — Indicator Council Deliberation — records a six-expert debate on which indicators are genuinely useful for predicting the best-performing investments.
Everything below is computed from a pinned, cached data snapshot (vintage 2026-06-02) produced by a one-time pull script. The chapter performs no live API calls when it renders; see §10 Reproducibility for how the snapshot is created and refreshed.
3.1 1. From business goals to a data problem
The Phase-1 thesis — geopolitical shock → economic transmission channel → asset re-pricing → opportunity — tells us where to look. Phase 2 tests whether the look is even possible with the mandated free sources: Yahoo Finance (yfinance), FRED, Alpha Vantage, and US-government APIs (EIA, US Treasury). Detailed source notes live in financial-data-sources.md; this chapter is the working record of actually pulling and vetting the data.
The four Phase-2 deliverables follow in order: collect → describe → explore → verify quality, now bridged to the strategy catalog in between.
3.2 2. Data sources & the free-source constraint
| Source | Auth | Free-tier limit | Used here for | Note |
|---|---|---|---|---|
| yfinance (Yahoo) | none | unofficial scraper | equities, ETFs, indices, crypto, copper OHLCV | breaks silently |
| FRED | API key | ~120 req/min | rates, macro, credit, conditions, commodity spots | pulled via keyless fredgraph.csv (full history) |
| Alpha Vantage | API key | 25 req/day, 5/min | FX (EUR/USD demo) | one call, cached |
| EIA v2 | API key | ~no hard cap | petroleum (WTI demo) | cross-checks FRED WTI |
| US Treasury fiscaldata | none | open | reporting exchange rates | keyless |
The Alpha Vantage 25-requests-per-day ceiling is the decisive design constraint: a single edit-render loop would exhaust it. That is why this project uses a cached architecture — a one-time pull writes the data to disk, and the book reads from disk forever after. As a bonus, the rendered book needs no API keys at all, so it builds on any machine from the committed snapshot.
Key handling. Keys live in a gitignored .env (template: .env.example); yfinance and Treasury need none. FRED is pulled via its keyless fredgraph.csv endpoint — it returns full history and sidesteps the per-key rate-limit/windowing that truncated some series, so the FRED API key is reserved for ALFRED point-in-time (vintage) data in Phase 3 (see §9).
3.3 3. Collect initial data
3.3.1 3.1 The instrument universe
The PoC universe maps each transmission channel and indicator role to concrete, free-to-obtain instruments. It is defined once in scripts/poc_universe.py:
3.3.2 3.2 How each source is pulled
The real fetchers live in scripts/poc_fetch.py; the orchestrator is scripts/poc_pull.py. The canonical call per source (shown, not executed at render time):
# yfinance — split/dividend-adjusted OHLCV, no key
import yfinance as yf
spy = yf.download("SPY", start="2000-01-01", end="2026-06-02", auto_adjust=True)
# FRED — keyless public CSV endpoint (full history; no pandas-datareader, which breaks on pandas 3.x)
import io, requests, pandas as pd
csv = requests.get("https://fred.stlouisfed.org/graph/fredgraph.csv",
params={"id": "BAA10Y", "cosd": "2000-01-01", "coed": "2026-06-02"}).text
baa10y = pd.read_csv(io.StringIO(csv), na_values=["."])
# Alpha Vantage — FX (key required; <=25 calls/day, 12s spacing)
av = requests.get("https://www.alphavantage.co/query",
params={"function": "FX_DAILY", "from_symbol": "EUR", "to_symbol": "USD",
"outputsize": "full", "apikey": "<ALPHAVANTAGE_API_KEY>"}).json()
# EIA v2 — petroleum series (key required)
eia = requests.get("https://api.eia.gov/v2/seriesid/PET.RWTC.D",
params={"api_key": "<EIA_API_KEY>"}).json()
# US Treasury fiscaldata — reporting rates of exchange (no key)
tr = requests.get("https://api.fiscaldata.treasury.gov/services/api/fiscal_service"
"/v1/accounting/od/rates_of_exchange",
params={"filter": "country_currency_desc:in:(Euro Zone-Euro)"}).json()3.3.3 3.3 Initial data collection report
What the snapshot pull actually retrieved (the executed manifest of cached files):
| symbol | channel | source | frequency | start | end | n_rows | |
|---|---|---|---|---|---|---|---|
| 0 | SPY | Broad-market baseline | yfinance | business-day | 2000-01-03 | 2026-06-01 | 6642 |
| 1 | XLE | Energy | yfinance | business-day | 2000-01-03 | 2026-06-01 | 6642 |
| 2 | ITA | Defense procurement | yfinance | business-day | 2006-05-05 | 2026-06-01 | 5049 |
| 3 | DBA | Agricultural commodities | yfinance | business-day | 2007-01-05 | 2026-06-01 | 4881 |
| 4 | BDRY | Shipping & insurance | yfinance | business-day | 2018-03-22 | 2026-06-01 | 2059 |
| 5 | GLD | Safe havens & FX | yfinance | business-day | 2004-11-18 | 2026-06-01 | 5416 |
| 6 | ^VIX | Safe havens & FX | yfinance | business-day | 2000-01-03 | 2026-06-01 | 6643 |
| 7 | UUP | Safe havens & FX | yfinance | business-day | 2007-03-01 | 2026-06-01 | 4844 |
| 8 | BTC-USD | Crypto (regime-dependent) | yfinance | daily | 2014-09-17 | 2026-06-01 | 4276 |
| 9 | ETH-USD | Crypto (regime-dependent) | yfinance | daily | 2017-11-09 | 2026-06-01 | 3127 |
| 10 | LMT | Defense procurement | yfinance | business-day | 2000-01-03 | 2026-06-01 | 6642 |
| 11 | DGS10 | Safe havens & FX | fred | business-day | 2000-01-03 | 2026-06-01 | 6891 |
| 12 | DGS2 | Safe havens & FX | fred | business-day | 2000-01-03 | 2026-06-01 | 6891 |
| 13 | T10Y3M | Macro context | fred | business-day | 2000-01-03 | 2026-06-02 | 6892 |
| 14 | DCOILWTICO | Energy | fred | business-day | 2000-01-04 | 2026-05-26 | 6886 |
| 15 | DHHNGSP | Energy | fred | business-day | 2000-01-04 | 2026-05-26 | 6886 |
| 16 | VIXCLS | Safe havens & FX | fred | business-day | 2000-01-03 | 2026-06-01 | 6891 |
| 17 | CPILFESL | Macro context | fred | monthly | 2000-01-01 | 2026-04-01 | 316 |
| 18 | UNRATE | Macro context | fred | monthly | 2000-01-01 | 2026-04-01 | 316 |
| 19 | WPU01210101 | Agricultural commodities | fred | monthly | 2000-01-01 | 2026-04-01 | 316 |
| 20 | BAMLH0A0HYM2 | Credit / risk regime | fred | business-day | 2023-06-05 | 2026-06-01 | 793 |
| 21 | BAMLC0A0CM | Credit / risk regime | fred | business-day | 2023-06-05 | 2026-06-01 | 793 |
| 22 | BAA10Y | Credit / risk regime | fred | business-day | 2000-01-03 | 2026-06-01 | 6891 |
| 23 | AAA10Y | Credit / risk regime | fred | business-day | 2000-01-03 | 2026-06-01 | 6891 |
| 24 | NFCI | Financial conditions | fred | weekly | 2000-01-07 | 2026-05-22 | 1377 |
| 25 | ANFCI | Financial conditions | fred | weekly | 2000-01-07 | 2026-05-22 | 1377 |
| 26 | T10YIE | Inflation expectations | fred | business-day | 2003-01-02 | 2026-06-02 | 6109 |
| 27 | DFII10 | Real rates | fred | business-day | 2003-01-02 | 2026-06-01 | 6108 |
| 28 | DFF | Monetary policy | fred | business-day | 2000-01-01 | 2026-06-01 | 9649 |
| 29 | M2SL | Liquidity | fred | monthly | 2000-01-01 | 2026-04-01 | 316 |
| 30 | INDPRO | Growth (coincident) | fred | monthly | 2000-01-01 | 2026-04-01 | 316 |
| 31 | SAHMREALTIME | Recession regime | fred | monthly | 2000-01-01 | 2026-04-01 | 316 |
| 32 | HG=F | Commodity / growth barometer | yfinance | business-day | 2000-08-30 | 2026-06-01 | 6466 |
| 33 | EUR/USD | Safe havens & FX | alphavantage | business-day | 2007-04-03 | 2026-06-02 | 5000 |
| 34 | PET.RWTC.D | Energy | eia | business-day | 2006-06-28 | 2026-05-26 | 5000 |
| 35 | EURO | Safe havens & FX | treasury | quarterly | 2001-03-31 | 2026-03-31 | 101 |
Pulled 36/36 instruments across all four free sources.
'All instruments pulled successfully.'
Issues encountered (honest log). Three findings worth recording, because they shaped the pipeline:
- The keyless FRED path initially failed (
pandas-datareaderis incompatible with pandas 3.x) and was replaced with FRED’s publicfredgraph.csvendpoint. - The IMF “Global price of Wheat” series (
PWHEAMTUSD) was discontinued on FRED; substituted the maintained PPI wheat series (WPU01210101). - The ICE BofA credit-spread series (
BAMLH0A0HYM2HY OAS,BAMLC0A0CMIG OAS) return only ~mid-2023 onward from FRED — a licensing restriction on the free ICE data, identical via the API and the CSV endpoint. Because a credit spread with no past recession in-sample is nearly useless (see §7), we added the free Moody’sBAA10Y/AAA10Yspreads, which span 1990→today (including 2008 and 2020). This fix was prompted directly by the indicator council.
3.4 4. Describe data
| filekey | name | channel | source | asset_type | frequency | start | end | n_rows | n_fields | tz_aware | pct_missing_value | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | DBA | Invesco DB Agriculture Fund | Agricultural commodities | yfinance | equity-ETF | business-day | 2007-01-05 | 2026-06-01 | 4881 | 6 | False | 0.00 |
| 1 | WPU_WHEAT | PPI by Commodity: Farm Products: Wheat | Agricultural commodities | fred | macro-index | monthly | 2000-01-01 | 2026-04-01 | 316 | 1 | False | 0.00 |
| 2 | SPY | SPDR S&P 500 ETF Trust | Broad-market baseline | yfinance | equity-ETF | business-day | 2000-01-03 | 2026-06-01 | 6642 | 6 | False | 0.00 |
| 3 | COPPER | COMEX Copper Futures (Dr. Copper) | Commodity / growth barometer | yfinance | commodity | business-day | 2000-08-30 | 2026-06-01 | 6466 | 6 | False | 0.00 |
| 4 | AAA10Y | Moody's Aaa Corporate minus 10Y Treasury (cred... | Credit / risk regime | fred | credit-spread | business-day | 2000-01-03 | 2026-06-01 | 6891 | 1 | False | 4.22 |
| 5 | BAA10Y | Moody's Baa Corporate minus 10Y Treasury (cred... | Credit / risk regime | fred | credit-spread | business-day | 2000-01-03 | 2026-06-01 | 6891 | 1 | False | 4.22 |
| 6 | HY_OAS | ICE BofA US High Yield OAS | Credit / risk regime | fred | credit-spread | business-day | 2023-06-05 | 2026-06-01 | 793 | 1 | False | 1.01 |
| 7 | IG_OAS | ICE BofA US Corporate (IG) OAS | Credit / risk regime | fred | credit-spread | business-day | 2023-06-05 | 2026-06-01 | 793 | 1 | False | 1.13 |
| 8 | BTC-USD | Bitcoin (USD) | Crypto (regime-dependent) | yfinance | crypto | daily | 2014-09-17 | 2026-06-01 | 4276 | 6 | False | 0.00 |
| 9 | ETH-USD | Ethereum (USD) | Crypto (regime-dependent) | yfinance | crypto | daily | 2017-11-09 | 2026-06-01 | 3127 | 6 | False | 0.00 |
| 10 | ITA | iShares U.S. Aerospace & Defense ETF | Defense procurement | yfinance | equity-ETF | business-day | 2006-05-05 | 2026-06-01 | 5049 | 6 | False | 0.00 |
| 11 | LMT | Lockheed Martin Corp. | Defense procurement | yfinance | equity | business-day | 2000-01-03 | 2026-06-01 | 6642 | 6 | False | 0.00 |
| 12 | DCOILWTICO | WTI Crude Oil Spot Price | Energy | fred | commodity-price | business-day | 2000-01-04 | 2026-05-26 | 6886 | 1 | False | 3.89 |
| 13 | DHHNGSP | Henry Hub Natural Gas Spot Price | Energy | fred | commodity-price | business-day | 2000-01-04 | 2026-05-26 | 6886 | 1 | False | 3.75 |
| 14 | EIA_WTI | WTI Crude Oil Spot (EIA) | Energy | eia | commodity-price | business-day | 2006-06-28 | 2026-05-26 | 5000 | 1 | False | 0.00 |
| 15 | XLE | Energy Select Sector SPDR Fund | Energy | yfinance | equity-ETF | business-day | 2000-01-03 | 2026-06-01 | 6642 | 6 | False | 0.00 |
| 16 | ANFCI | Chicago Fed Adjusted NFCI | Financial conditions | fred | index | weekly | 2000-01-07 | 2026-05-22 | 1377 | 1 | False | 0.00 |
| 17 | NFCI | Chicago Fed National Financial Conditions Index | Financial conditions | fred | index | weekly | 2000-01-07 | 2026-05-22 | 1377 | 1 | False | 0.00 |
| 18 | INDPRO | Industrial Production Index | Growth (coincident) | fred | macro-index | monthly | 2000-01-01 | 2026-04-01 | 316 | 1 | False | 0.00 |
| 19 | T10YIE | 10-Year Breakeven Inflation Rate | Inflation expectations | fred | rate | business-day | 2003-01-02 | 2026-06-02 | 6109 | 1 | False | 4.11 |
| 20 | M2SL | M2 Money Supply (seasonally adjusted) | Liquidity | fred | macro-level | monthly | 2000-01-01 | 2026-04-01 | 316 | 1 | False | 0.00 |
| 21 | CPILFESL | Core CPI (All Urban, less food & energy) | Macro context | fred | macro-index | monthly | 2000-01-01 | 2026-04-01 | 316 | 1 | False | 0.32 |
| 22 | T10Y3M | 10Y minus 3M Treasury Spread | Macro context | fred | rate-spread | business-day | 2000-01-03 | 2026-06-02 | 6892 | 1 | False | 4.14 |
| 23 | UNRATE | Unemployment Rate | Macro context | fred | macro-rate | monthly | 2000-01-01 | 2026-04-01 | 316 | 1 | False | 0.32 |
| 24 | DFF | Effective Federal Funds Rate | Monetary policy | fred | rate | business-day | 2000-01-01 | 2026-06-01 | 9649 | 1 | False | 0.00 |
| 25 | DFII10 | 10-Year TIPS Real Yield | Real rates | fred | rate | business-day | 2003-01-02 | 2026-06-01 | 6108 | 1 | False | 4.11 |
| 26 | SAHM | Sahm Rule Recession Indicator (real-time) | Recession regime | fred | macro-rate | monthly | 2000-01-01 | 2026-04-01 | 316 | 1 | False | 0.32 |
| 27 | DGS10 | 10-Year Treasury Constant Maturity Rate | Safe havens & FX | fred | rate | business-day | 2000-01-03 | 2026-06-01 | 6891 | 1 | False | 4.14 |
| 28 | DGS2 | 2-Year Treasury Constant Maturity Rate | Safe havens & FX | fred | rate | business-day | 2000-01-03 | 2026-06-01 | 6891 | 1 | False | 4.14 |
| 29 | EURUSD_AV | EUR/USD daily (Alpha Vantage FX_DAILY) | Safe havens & FX | alphavantage | FX | business-day | 2007-04-03 | 2026-06-02 | 5000 | 5 | False | 0.00 |
| 30 | GLD | SPDR Gold Shares | Safe havens & FX | yfinance | equity-ETF | business-day | 2004-11-18 | 2026-06-01 | 5416 | 6 | False | 0.00 |
| 31 | TREAS_EUR | Treasury Reporting Rate of Exchange — Euro | Safe havens & FX | treasury | FX | quarterly | 2001-03-31 | 2026-03-31 | 101 | 1 | False | 0.00 |
| 32 | UUP | Invesco DB US Dollar Index Bullish Fund | Safe havens & FX | yfinance | equity-ETF | business-day | 2007-03-01 | 2026-06-01 | 4844 | 6 | False | 0.00 |
| 33 | VIX | CBOE Volatility Index | Safe havens & FX | yfinance | index | business-day | 2000-01-03 | 2026-06-01 | 6643 | 6 | False | 0.00 |
| 34 | VIXCLS | CBOE Volatility Index (FRED) | Safe havens & FX | fred | index | business-day | 2000-01-03 | 2026-06-01 | 6891 | 1 | False | 3.16 |
| 35 | BDRY | Breakwave Dry Bulk Shipping ETF | Shipping & insurance | yfinance | equity-ETF | business-day | 2018-03-22 | 2026-06-01 | 2059 | 6 | False | 0.00 |
Two structural facts dominate the description and drive Phase 3:
- Inception heterogeneity. Histories start at very different dates — SPY/LMT/^VIX and many FRED series reach back to 2000, but GLD begins 2004, ITA 2006, UUP/DBA 2007, BTC 2014, ETH 2017, and BDRY only 2018; the ICE OAS series only ~2023. Any cross-asset model must handle ragged start dates rather than assume a common window.
- Frequency mix. Business-day equities/ETFs (~252 obs/yr), 7-day crypto, business-day FRED rates/spots (weekend gaps), weekly NFCI/ANFCI, and monthly macro (CPI, unemployment, M2, industrial production, Sahm, wheat PPI) coexist. Naive joining would silently drop weekends or fabricate values — handled as a quality finding in §7 and an alignment task in §9.
3.5 5. Indicator framework: mapping data to strategy
strategy.md’s organizing principle is that individual indicators are noisy and regime-dependent; their value comes from combining them (diffusion indices, z-scores, multi-signal confirmation). Its highest-conviction signals form a “Stage-1 regime dashboard”: yield curve, credit spreads, financial conditions, the Sahm rule, VIX, copper/gold, and the dollar. The expansion in this revision was chosen to assemble that dashboard from free data.
The mapping of our snapshot onto the catalog’s taxonomy:
| strategy.md category | Type | Our instruments | Signal use | Reliability caveat |
|---|---|---|---|---|
| Yield curve | Market-based lead | DGS10, DGS2, T10Y3M |
recession lead; slope and disinversion | gave a false signal 2022–24 |
| Credit spreads | Market-based lead | BAA10Y, AAA10Y (1990+); HY_OAS,IG_OAS (2023+) |
risk-off early warning | ICE OAS history licensing-capped → lean on Moody’s |
| Financial conditions | Market-based | NFCI, ANFCI |
tightening leads slowdowns | ANFCI strips the cycle |
| Volatility | Market-based | VIX, VIXCLS |
risk-off regime; vol-scaling | coincident; low VIX = complacency |
| Inflation exp. / real rates | Market-based | T10YIE, DFII10 |
reflation-vs-stagflation quadrant; gold driver | from 2003 only |
| Policy / liquidity | Monetary | DFF, M2SL |
risk anchor; liquidity | money→inflation link loose |
| Growth | Coincident | INDPRO, SPY |
business-cycle state | — |
| Recession onset | Regime | SAHMREALTIME (+UNRATE) |
onset trigger (≥0.50) | coincident-early; labor-supply distortion |
| Inflation / labor | Lagging | CPILFESL, UNRATE |
not forward signals | catalog’s named trap |
| Commodities / Dr. Copper | Commodity | COPPER, DCOILWTICO, DHHNGSP, wheat |
growth barometer; copper/gold | spot vs ETF proxy |
| FX / dollar | Currency | UUP, EURUSD_AV, TREAS_EUR |
dollar smile; risk-off | proxies |
| Equity channels | — | XLE,ITA,LMT,DBA,BDRY |
transmission-channel proxies | proxies, not the underlying |
| Crypto | — | BTC-USD, ETH-USD |
regime-dependent | on-chain MVRV not free (gap) |
| Factor premia | Equity factor | none buildable | value/mom/quality | needs single-name cross-section (gap) |
| LEI / ISM-PMI | Composite lead | none | — | licensed, not free |
The dashboard is now (mostly) assembled from free data: curve ✓, credit ✓ (BAA10Y), financial conditions ✓ (NFCI/ANFCI), Sahm ✓, VIX ✓, copper/gold ✓, dollar ✓. Only the licensed composites (LEI, ISM) and survivorship-free single-name fundamentals remain out of reach.
3.5.1 5.1 Credit spread — the recession-tested risk gauge
fig, ax = plt.subplots(figsize=(10, 4.2))
baa = panel["BAA10Y"].dropna()
ax.plot(baa.index, baa, color="tab:purple", linewidth=1.0, label="Baa − 10Y (BAA10Y)")
ax.axhline(baa.median(), color="grey", ls="--", lw=0.8, label="median")
ax.set_ylabel("Credit spread (pp)")
ax.set_title("Moody's Baa credit spread — the recession-spanning risk-regime signal")
ax.legend(loc="upper right", fontsize=9)
plt.tight_layout()
plt.show()
3.5.2 5.2 Yield curve — slope and inversions
fig, ax = plt.subplots(figsize=(10, 4.2))
curve = panel["T10Y3M"].dropna()
ax.plot(curve.index, curve, color="tab:blue", linewidth=0.9)
ax.axhline(0, color="black", lw=0.8)
ax.fill_between(curve.index, curve, 0, where=(curve < 0), color="tab:red", alpha=0.4,
label="inverted (recession lead / false-signal risk)")
ax.set_ylabel("10y − 3m spread (pp)")
ax.set_title("Yield-curve slope")
ax.legend(loc="lower right", fontsize=9)
plt.tight_layout()
plt.show()
3.5.3 5.3 Copper/gold ratio vs. the 10-year yield
fig, ax1 = plt.subplots(figsize=(10, 4.2))
cg = (panel["COPPER"] / panel["GLD"]).dropna()
ax1.plot(cg.index, cg, color="tab:orange", linewidth=1.0, label="copper/gold (proxy)")
ax1.set_ylabel("copper / gold (ratio, proxy)", color="tab:orange")
ax2 = ax1.twinx()
dgs10 = panel["DGS10"].dropna()
ax2.plot(dgs10.index, dgs10, color="tab:blue", linewidth=0.9, label="10y yield (DGS10)")
ax2.set_ylabel("10y yield (%)", color="tab:blue")
ax1.set_title("Copper/gold vs. the 10-year yield")
plt.tight_layout()
plt.show()
3.6 6. Explore data
3.6.1 6.1 Summary statistics
| count | mean | std | min | 25% | 50% | 75% | max | pct_missing | |
|---|---|---|---|---|---|---|---|---|---|
| SPY | 6642.0 | 203.70 | 163.48 | 49.81 | 84.97 | 124.00 | 267.67 | 758.54 | 31.17 |
| XLE | 6642.0 | 21.45 | 11.06 | 5.18 | 13.82 | 21.31 | 25.63 | 62.56 | 31.17 |
| ITA | 5049.0 | 70.80 | 49.94 | 11.72 | 26.48 | 55.78 | 100.89 | 250.42 | 47.68 |
| DBA | 4881.0 | 20.70 | 4.68 | 11.61 | 17.24 | 20.89 | 23.97 | 36.25 | 49.42 |
| BDRY | 2059.0 | 13.34 | 7.24 | 3.91 | 7.78 | 10.72 | 18.61 | 41.51 | 78.66 |
| GLD | 5416.0 | 141.97 | 72.28 | 41.26 | 107.26 | 125.54 | 167.38 | 495.90 | 43.88 |
| UUP | 4844.0 | 21.48 | 2.77 | 17.48 | 19.25 | 20.97 | 22.62 | 28.86 | 49.80 |
| BTC-USD | 4276.0 | 28603.50 | 32439.66 | 178.10 | 3455.37 | 11533.04 | 46366.20 | 124752.53 | 55.69 |
| ETH-USD | 3127.0 | 1716.06 | 1274.73 | 84.31 | 380.30 | 1732.25 | 2662.89 | 4831.35 | 67.60 |
| LMT | 6642.0 | 167.56 | 158.55 | 8.55 | 42.14 | 66.01 | 298.00 | 672.30 | 31.17 |
| COPPER | 6466.0 | 2.89 | 1.23 | 0.60 | 2.15 | 3.06 | 3.72 | 6.64 | 32.99 |
| VIX | 6643.0 | 19.84 | 8.32 | 9.14 | 14.03 | 17.82 | 23.21 | 82.69 | 31.16 |
3.6.2 6.2 Normalized price history by channel
Each tradable proxy is indexed to 100 at its first available observation (log scale, so different inception dates and magnitudes are comparable):
fig, ax = plt.subplots(figsize=(10, 5.5))
for fk in ["SPY", "XLE", "ITA", "GLD", "BTC-USD"]:
s = panel[fk].dropna()
if len(s):
ax.plot(s.index, 100 * s / s.iloc[0], label=fk, linewidth=1.3)
ax.set_yscale("log")
ax.set_ylabel("Indexed to 100 at inception (log)")
ax.set_title("Transmission-channel proxies, normalized")
ax.legend(loc="upper left", ncol=3, fontsize=9)
plt.tight_layout()
plt.show()
3.6.3 6.3 Rolling volatility (risk regimes)
rets = q.returns_panel(panel[["SPY", "GLD", "BTC-USD"]], kind="log")
vol = rets.rolling(30).std() * np.sqrt(252)
fig, ax = plt.subplots(figsize=(10, 4.5))
for c in vol.columns:
ax.plot(vol.index, vol[c], label=c, linewidth=1.1)
ax.set_ylabel("Annualized volatility")
ax.set_title("30-day rolling volatility")
ax.legend(loc="upper right")
plt.tight_layout()
plt.show()
3.6.4 6.4 Cross-asset return correlation
cm = q.correlation_matrix(panel, TRADABLE)
fig, ax = plt.subplots(figsize=(9, 7.5))
sns.heatmap(cm, annot=True, fmt=".2f", cmap="vlag", center=0,
square=True, cbar_kws={"shrink": 0.8}, ax=ax)
ax.set_title("Daily-return correlation")
plt.tight_layout()
plt.show()
3.6.5 6.5 First hypotheses (to be tested, not findings)
These are explicitly labelled hypotheses — exploratory patterns to be validated with rigor in later phases, never treated as confirmed edges:
- H1 (channel structure). Energy equity (XLE) co-moves with crude; defense (ITA/LMT) and gold (GLD) show partly diversifying profiles vs. the broad market (SPY).
- H2 (crypto regime-switching). BTC’s correlation with SPY is not constant — likely rising in risk-on periods and breaking around stress, consistent with the Phase-1 “regime-dependent” claim.
- H3 (risk-off signature). VIX spikes, credit-spread (BAA10Y) widening, and curve dynamics should cluster around equity drawdowns — motivating a combined regime read rather than any single signal.
3.7 7. Verify data quality
3.7.1 7.1 Quality scorecard
| filekey | channel | freq | n_rows | missing | pct_missing | duplicate_idx | ordered | stale(>=5) | max_stale_run | positivity | lookahead_release_lag | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | DBA | Agricultural commodities | business-day | 4881 | pass | 0.00 | pass | pass | pass | 2 | pass | pass |
| 1 | WPU_WHEAT | Agricultural commodities | monthly | 316 | pass | 0.00 | pass | pass | pass | 1 | n/a | review |
| 2 | SPY | Broad-market baseline | business-day | 6642 | pass | 0.00 | pass | pass | pass | 1 | pass | pass |
| 3 | COPPER | Commodity / growth barometer | business-day | 6466 | pass | 0.00 | pass | pass | pass | 2 | n/a | pass |
| 4 | AAA10Y | Credit / risk regime | business-day | 6891 | warn | 4.22 | pass | pass | warn | 7 | n/a | pass |
| 5 | BAA10Y | Credit / risk regime | business-day | 6891 | warn | 4.22 | pass | pass | warn | 6 | n/a | pass |
| 6 | HY_OAS | Credit / risk regime | business-day | 793 | warn | 1.01 | pass | pass | pass | 2 | n/a | pass |
| 7 | IG_OAS | Credit / risk regime | business-day | 793 | warn | 1.13 | pass | pass | warn | 10 | n/a | pass |
| 8 | BTC-USD | Crypto (regime-dependent) | daily | 4276 | pass | 0.00 | pass | pass | pass | 1 | pass | pass |
| 9 | ETH-USD | Crypto (regime-dependent) | daily | 3127 | pass | 0.00 | pass | pass | pass | 0 | pass | pass |
| 10 | ITA | Defense procurement | business-day | 5049 | pass | 0.00 | pass | pass | pass | 2 | pass | pass |
| 11 | LMT | Defense procurement | business-day | 6642 | pass | 0.00 | pass | pass | pass | 1 | pass | pass |
| 12 | DCOILWTICO | Energy | business-day | 6886 | warn | 3.89 | pass | pass | pass | 2 | fail | pass |
| 13 | DHHNGSP | Energy | business-day | 6886 | warn | 3.75 | pass | pass | warn | 10 | pass | pass |
| 14 | EIA_WTI | Energy | business-day | 5000 | pass | 0.00 | pass | pass | pass | 1 | fail | pass |
| 15 | XLE | Energy | business-day | 6642 | pass | 0.00 | pass | pass | pass | 2 | pass | pass |
| 16 | ANFCI | Financial conditions | weekly | 1377 | pass | 0.00 | pass | pass | pass | 2 | n/a | pass |
| 17 | NFCI | Financial conditions | weekly | 1377 | pass | 0.00 | pass | pass | pass | 3 | n/a | pass |
| 18 | INDPRO | Growth (coincident) | monthly | 316 | pass | 0.00 | pass | pass | pass | 0 | n/a | review |
| 19 | T10YIE | Inflation expectations | business-day | 6109 | warn | 4.11 | pass | pass | warn | 5 | n/a | pass |
| 20 | M2SL | Liquidity | monthly | 316 | pass | 0.00 | pass | pass | pass | 0 | n/a | review |
| 21 | CPILFESL | Macro context | monthly | 316 | warn | 0.32 | pass | pass | pass | 2 | n/a | review |
| 22 | T10Y3M | Macro context | business-day | 6892 | warn | 4.14 | pass | pass | pass | 3 | n/a | pass |
| 23 | UNRATE | Macro context | monthly | 316 | warn | 0.32 | pass | pass | pass | 4 | n/a | review |
| 24 | DFF | Monetary policy | business-day | 9649 | pass | 0.00 | pass | pass | warn | 419 | n/a | pass |
| 25 | DFII10 | Real rates | business-day | 6108 | warn | 4.11 | pass | pass | pass | 3 | n/a | pass |
| 26 | SAHM | Recession regime | monthly | 316 | warn | 0.32 | pass | pass | pass | 3 | n/a | review |
| 27 | DGS10 | Safe havens & FX | business-day | 6891 | warn | 4.14 | pass | pass | pass | 4 | n/a | pass |
| 28 | DGS2 | Safe havens & FX | business-day | 6891 | warn | 4.14 | pass | pass | warn | 7 | n/a | pass |
| 29 | EURUSD_AV | Safe havens & FX | business-day | 5000 | pass | 0.00 | pass | pass | pass | 2 | pass | pass |
| 30 | GLD | Safe havens & FX | business-day | 5416 | pass | 0.00 | pass | pass | pass | 1 | pass | pass |
| 31 | TREAS_EUR | Safe havens & FX | quarterly | 101 | pass | 0.00 | pass | pass | pass | 1 | pass | pass |
| 32 | UUP | Safe havens & FX | business-day | 4844 | pass | 0.00 | pass | pass | pass | 3 | pass | pass |
| 33 | VIX | Safe havens & FX | business-day | 6643 | pass | 0.00 | pass | pass | pass | 2 | n/a | pass |
| 34 | VIXCLS | Safe havens & FX | business-day | 6891 | warn | 3.16 | pass | pass | pass | 2 | n/a | pass |
| 35 | BDRY | Shipping & insurance | business-day | 2059 | pass | 0.00 | pass | pass | warn | 7 | pass | pass |
3.7.2 7.2 Cross-source consistency
Where two sources measure the same concept, they should agree — a direct quality probe:
| series_a | series_b | overlap_rows | level_corr | |
|---|---|---|---|---|
| 0 | VIX | VIXCLS | 6643 | 1.0 |
| 1 | DCOILWTICO | EIA_WTI | 4988 | 1.0 |
^VIX (yfinance) vs VIXCLS (FRED) and DCOILWTICO (FRED WTI) vs the EIA WTI series give two clean cross-source validations across three independent providers.
3.7.3 7.3 Material quality findings
Minimum WTI value in snapshot: -36.98 on 2020-04-20
ICE HY OAS free history: 785 rows from 2023-06-05 (licensing-capped)
- Negative oil price (real, not an error). WTI’s minimum is the negative print of April 2020 — a genuine market event. It breaks log-return math and any “prices are positive” assumption.
- Credit-spread history is licensing-capped. The ICE BofA OAS series everyone reaches for first only exist from ~2023 on the free tier — zero past recessions in-sample. The council flagged this as the debate’s blind spot; the fix was the free Moody’s
BAA10Y(1990+). A signal you can’t observe across a recession can’t be calibrated on data you own. - Calendar misalignment. Crypto trades 7 days/week; equities ~5; NFCI is weekly; macro is monthly. Per-series statistics use each series’ native frequency; any aligned view must not be read as if all series share a calendar.
- Look-ahead / release lag. Monthly macro (CPI, unemployment, M2, industrial production, Sahm) is stamped by reference period, known only on the later release date. Phase 3 must lag these by their publication delay — the exact trap Chapter 1 warned about.
- Survivorship & venue caveats. Free equity sources omit delisted names (ETFs only partly mitigate); yfinance crypto is an aggregate, not a single venue. Acceptable for this PoC; flagged before any Phase-4 backtest.
3.8 8. Gaps vs. the strategy & geopolitical thesis
The free sources cannot supply several things strategy.md and the Chapter-1 thesis ultimately want. Named honestly so they are not silently assumed:
- Licensed composites — Conference Board LEI, ISM/PMI — not free; their market-based components (curve, credit, S&P) are in the snapshot directly, a partial substitute.
- Long ICE BofA credit history — licensing-capped to ~2023; Moody’s
BAA10Ysubstitutes. - Cross-sectional equity factors (value/momentum/quality) — need a survivorship-bias-free single-name universe (CRSP-style); unbuildable on a dozen ETFs + two single names.
- Quantified geopolitical event data (GDELT/ACLED/ICEWS) — deferred to a later phase, consistent with baseline-first; proxied for now by VIX, credit, and the conflict-sensitive sleeves.
- Crypto on-chain (MVRV/SOPR) and real freight indices (Baltic Dry) — paid; proxied by price/volume and the BDRY ETF.
3.9 9. Implications for Data Preparation (Phase 3 hand-off)
Phase 3 must enforce the discipline strategy.md spells out, or any “edge” we find will be an artifact:
- Point-in-time / vintage data. Source macro features from ALFRED (not revised FRED) and lag every release to its actual publication date — this is the real reason to hold the FRED API key.
- Stationary, comparable features. Transform levels to YoY/MoM changes, rolling z-scores, percentile ranks, and diffusion indices; spreads/ratios (curve slope, copper/gold, BAA−AAA); standardized surprise (actual − consensus) where a consensus feed exists.
- Master calendar & alignment. Business-day master index; resample crypto to it (keeping a native 7-day view); lag-aware fills for weekly/monthly macro — never fill across the release frontier.
- Anti-overfitting discipline. With only ~3–4 genuine regime episodes in 25 years, prefer a small, fixed, equal-weighted regime score over fitted weights; validate with purged/embargoed walk-forward CV, deflated Sharpe, and elevated t-stat hurdles; benchmark against dumb buy-and-hold SPY and 60/40.
- Council’s steer. Use the combined regime read as a gross-exposure / volatility-scaling dial, not a buy/sell trigger; route the conflict-sensitive sleeve by the reflation-vs-stagflation quadrant (
T10YIEvsDFII10); defer factors and crypto on-chain until their data is funded. See the Indicator Council Deliberation. → Data Preparation.
3.10 10. Reproducibility
Snapshot vintage : 2026-06-02
Instruments : 36
Python : 3.12.13
pandas : 3.0.3
numpy : 2.4.6
yfinance : 1.4.1
To reproduce or refresh the snapshot:
- Create the environment:
python -m venv .venv, activate it,pip install -r requirements.txt. - Register the kernel:
python -m ipykernel install --user --name portfoliolens(matchesjupyter: portfoliolensin_quarto.yml). - Copy
.env.exampleto.envand add free keys (optional —yfinance/FRED/Treasury run without). - Pull:
python scripts/poc_pull.py→ writesdata/raw/and the committeddata/snapshot/. - Render: with the venv active,
quarto render— it reads the snapshot only (no network, no keys).
The small data/snapshot/ (parquet + manifest.csv) is committed so a fresh clone renders this chapter without any pull; bulk data/raw/ is gitignored.
Phase-2 deliverables — complete: initial data collection report (§3), data description report (§4), an indicator-framework mapping to strategy.md (§5), data exploration report (§6), and data quality report (§7) — over 36 instruments of real, free-sourced data spanning 2000–2026, with the expert debate in the Indicator Council appendix.