2 Business Understanding
CRISP-DM Phase 1. Understand the project objectives and requirements from a business perspective, then convert this knowledge into a data-mining problem definition and a preliminary plan. See The CRISP-DM Process for the methodology overview.
This chapter documents the business logic of PortfolioLens. Because the “business” here is a personal investment portfolio, the “business logic” is personal investment logic — the explicit, written reasoning that governs how capital is allocated, what edge we believe we have, what we will not do, and how we will know whether any of it is actually working.
The reasoning in this chapter was stress-tested by a deliberate, adversarial debate among six domain perspectives (quantitative finance, political science, international-relations theory, international-business macroeconomics, market-efficiency skepticism, and risk management). That deliberation is preserved in full in the Advisory Council Deliberation appendix; this chapter is its distilled, decision-ready output.
2.1 Context & Background
The problem. Discretionary, narrative-driven personal investing is hard to evaluate and easy to fool yourself with. PortfolioLens reframes personal investing as an empirical, hypothesis-driven research project: identify patterns in objective data that precede market moves, translate them into rules, and validate those rules honestly before risking capital.
The thesis. Major geopolitical events — interstate conflict, invasions, sanctions, escalation — produce economic consequences that propagate through identifiable channels (energy, commodities, currencies, defense procurement, shipping, capital flows). If those consequences can be anticipated even slightly better than consensus, they create risk-adjusted opportunity. The chain we are betting on:
Geopolitical shock → economic transmission channel → asset re-pricing → opportunity
(the event) (energy/FX/commodities/defense) (specific tickers) (entry/exit)
A crucial refinement, drawn directly from the council: we do not trade the headline. The market reacts to the revision of expectations (the surprise), and the durable, retail-tradable signal lives in the transmission mechanism and its second-order effects, which can play out over weeks and months — not in the first-second reaction, which institutions own.
Scope. Personal portfolio, personal use, personal capital. The models built here are decision-support tools for one investor, not a product, fund, or service for others.
Investor profile. (Recommended defaults below — confirm and adjust to your situation.)
| Attribute | Working assumption (confirm) |
|---|---|
| Account type | Personal taxable + any tax-advantaged accounts |
| Capital base | Risk capital the investor can afford to lose without lifestyle impact |
| Execution | Manual or semi-automated; not latency-competitive with institutions |
| Skill/tooling | Python-centric data science; this Quarto book as the research log |
| Time available | Part-time research; strategy must survive infrequent attention |
2.2 Business Objectives
In plain language, the personal-investing goals — in priority order:
- Don’t go broke. Preserve capital; avoid ruin and catastrophic drawdowns. This outranks return maximization.
- Beat a passive benchmark on a risk-adjusted basis. Earn more per unit of risk than a simple buy-and-hold of the same asset mix would.
- Capture geopolitical opportunity if it is real. Determine whether geopolitical signals add genuine, repeatable value over a conventional model — and exploit them only to the extent the evidence supports.
- Build durable, reusable infrastructure. A documented, repeatable research and decision process that improves over time, rather than a one-off lucky call.
2.3 Business Success Criteria
Success is defined primarily in risk-adjusted terms, not raw return. Targets below are recommended starting points to be confirmed by the investor.
| Criterion | Target (confirm) | Why |
|---|---|---|
| Primary — Sharpe ratio | Out-of-sample Sharpe ≥ 1.0, and meaningfully above the benchmark’s | Rewards return per unit of risk; resists “lucky high-volatility” outcomes |
| Max drawdown ceiling | Peak-to-trough ≤ 20–25% | Survivability and behavioral tolerance; a strategy you can actually hold |
| Benchmark to beat | Risk-adjusted return of a passive blend (e.g., a stock index + a fixed BTC/ETH sleeve matching the strategy’s average exposure) | Honest comparison: are we adding value over doing nothing clever? |
| Geopolitical lift | Geopolitical features improve OOS Sharpe by a statistically and economically meaningful margin over the baseline | The project’s central, falsifiable question |
| Capital-preservation rule | No single thesis risks more than a pre-set fraction of capital | Prevents one confident, wrong bet from undoing a year of gains |
A “no” is a valid success. If rigorous testing shows geopolitical signals don’t add durable value, discovering that — before betting real money — is a successful Phase-5 outcome, not a failure.
2.4 The Investment Thesis / Business Logic
This is the heart of the chapter: why we believe an edge could exist and where to look. Conflict does not move “the market” uniformly — it moves specific instruments through specific channels. The business logic is to map channels to tradable assets, then test whether anticipating the channel beats consensus.
| Transmission channel | What moves | Candidate instruments | Direction logic |
|---|---|---|---|
| Energy | Oil & gas supply/route risk | Energy producers, oil/gas futures & ETFs | Supply threat → price up; producers benefit |
| Agricultural commodities | Grain/fertilizer exporters disrupted | Wheat/corn, ag & fertilizer equities | Export disruption → price up |
| Defense procurement | Sustained re-armament spending | Defense primes & ETFs | Conflict → multi-year capex re-rating |
| Safe havens & FX | Flight to safety | USD, gold, Treasuries | Risk-off → safe assets bid |
| Shipping & insurance | Chokepoint/route risk (Hormuz, Suez, Black Sea, Taiwan Strait) | Tanker/shipping, freight rates | Route risk → rates & insurance up |
| Crypto (regime-dependent) | Flips by regime | BTC, ETH | Risk-on “tech beta” or capital-flight/sanctions hedge — context decides |
Three principles the council insisted on:
- Trade the surprise, not the event. The tradable moment is usually the escalation phase — when probabilities shift but consensus hasn’t repriced — or the second-order consequence, not the moment of invasion.
- Trade the mechanism, not the narrative. “There’s a war” is not a trade. “Black Sea grain exports are blocked, so wheat and fertilizer names re-rate” is a trade.
- Respect regime-dependence. The same event can hit crypto through opposite channels depending on context. A static, one-size model will be wrong; regime awareness is part of the design.
2.5 Strategic Approach — Baseline-First
Per the confirmed project decision, the geopolitical edge is treated as a hypothesis tested against a baseline, not an assumption:
- Stage 1 — Baseline. Build a conventional, multi-horizon model on the conflict-sensitive sectors + major crypto universe using price/volume, fundamentals, and standard macro features. Establish its honest out-of-sample, cost-aware performance. This is the bar to beat.
- Stage 2 — Geopolitical layer. Add quantified geopolitical/event features (e.g., GDELT / ACLED / ICEWS escalation signals) and measure the marginal lift over the baseline. Keep the features only if they add value that survives validation.
Horizon. A multi-horizon blend is targeted: short event-driven reactions (days–weeks), medium-term swing positioning on transmission channels (weeks–months), and long-term strategic tilts (months–years). The medium horizon is expected to be the most realistic source of retail edge; short-horizon claims face the highest skepticism and friction.
2.6 Situation Assessment
Resources
- Market data: see Financial Data Sources —
yfinance/ Tiingo for prototyping, SEC EDGAR for fundamentals, CCXT/CoinGecko for crypto, FRED for macro; point-in-time sources (e.g., Norgate) noted for rigorous backtests. - Geopolitical data: GDELT, ACLED, ICEWS (event/escalation streams); Correlates of War, Polity for structural context.
- Tooling: Python data-science stack; this Quarto book as the documented research log; Git for version control.
- Human: one part-time investor/researcher.
Assumptions
- Geopolitical consequences propagate through identifiable, persistent channels.
- A patient, medium-term horizon is reachable for a retail investor; the sub-second game is not.
- Quality, point-in-time data is obtainable at acceptable cost.
Constraints
- No latency/co-location edge; no institutional information access.
- Limited research time; the strategy must tolerate infrequent attention.
- Real frictions: spreads, slippage, and (notably) short-term capital-gains tax.
Preliminary cost/benefit
- Costs: data subscriptions, research time, and the very real risk of losses if the edge is illusory or poorly executed.
- Benefits: improved risk-adjusted returns if an edge exists; and regardless, a disciplined, documented process that replaces gut-feel investing with evidence.
2.7 Risks & Mitigations
Distilled from the council deliberation:
| Risk | Description | Mitigation |
|---|---|---|
| Already priced in (EMH) | Macro geopolitics is the most-watched arena on earth; the move may be gone before a retail click | Target the slow transmission channel and escalation phase, not the headline; lean on the patience/time-horizon edge |
| Non-stationarity / regime change | The relationship found may be era-specific and simply stop working | Regime-aware design; rolling re-validation; theory-guided (not purely mined) features |
| Overfitting / data-snooping | With enough hypotheses, a fake edge always appears | Walk-forward, purged/embargoed, leakage-free validation; baseline-first; out-of-sample discipline |
| Fat-tail / gap risk | A predicted conflict de-escalates overnight; position gaps against you | Position sizing first; defined-risk structures; no oversized “obvious” bets |
| Execution friction | Costs, slippage, and short-term taxes can erase a paper edge | Transaction-cost-aware backtests; prefer longer holds; model taxes explicitly |
| Behavioral discipline | The operator overrides the model at the worst moment | Pre-committed rules; the model’s job is to constrain the human, documented here |
2.8 Ethical & Personal Investment Policy
This project explicitly seeks to profit from events that include war and human suffering. That deserves a conscious, written stance rather than silent avoidance (the council flagged it as the easiest thing to ignore).
Recommended default policy (adjust to your values):
- Trade broad economic consequences (energy, commodities, FX, indices, crypto) and conventional, widely-held defense exposure.
- Treat any personal exclusions (specific sectors or companies the investor is unwilling to hold) as a hard constraint in the investable universe — documented here, enforced in code during Data Preparation.
- Profiting from anticipated macro consequences is distinct from causing or influencing events; this strategy does neither.
Action: confirm your exclusion list (if any). It becomes a filter on the asset universe in Data Preparation.
2.9 Data-Mining Goals
Translating the business goals into technical targets the later chapters will pursue:
| Business goal | Data-mining goal | Success metric |
|---|---|---|
| Beat benchmark on risk-adjusted basis | Forecast N-day-forward risk-adjusted return (or regime) per instrument across horizons | Out-of-sample Sharpe vs. benchmark |
| Determine if geopolitics adds value | Measure marginal lift of geopolitical features over the baseline model | OOS Sharpe lift; feature-importance robustness; statistical significance |
| Time entries/exits sensibly | Classify regime / direction with calibrated probabilities | AUC / hit rate; calibration; cost-aware backtest P&L |
| Don’t go broke | Risk model: drawdown, volatility, and position-sizing outputs | Max drawdown within ceiling; Kelly-fraction-bounded sizing |
Validation discipline (non-negotiable): strict walk-forward with purge/embargo, no look-ahead (event features timestamped by availability, fundamentals by filing date), and all backtests net of realistic transaction costs.
2.10 Project Plan
Mapping the work onto the remaining CRISP-DM chapters:
| Phase | Chapter | Phase-1 hand-off |
|---|---|---|
| 2. Data Understanding | 02_data_understanding | Acquire & profile market + geopolitical data; verify quality; first hypotheses |
| 3. Data Preparation | 03_data_preparation | Build point-in-time dataset; engineer baseline + geopolitical features; apply ethical universe filter |
| 4. Modeling | 04_modeling | Baseline model → add geopolitical layer; multi-horizon; walk-forward test design |
| 5. Evaluation | 05_evaluation | Judge against the Sharpe/drawdown success criteria; decide if geopolitical lift is real |
| 6. Deployment | 06_deployment | Repeatable signal/decision process; monitoring for regime change |
Tooling & cadence: Python research notebooks rendered into this Quarto book; Git-tracked; iterative loops back to earlier phases as understanding deepens (CRISP-DM is not linear).